Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable hourly log ship mode #1025

Merged
merged 44 commits into from
Sep 26, 2024
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
4601fed
enable hourly log ship mode
FxKu Sep 11, 2024
80c5e4f
minor signature fix
FxKu Sep 11, 2024
59e1fe2
use curly braces around loop variables
FxKu Sep 11, 2024
f7f6ebe
try with unified bool strings
FxKu Sep 12, 2024
41aa2c5
better empty than false to make template work
FxKu Sep 12, 2024
399f05e
pass new flag to log env
FxKu Sep 12, 2024
e0638b4
wrong if on log_filename
FxKu Sep 12, 2024
0ec8dc9
add treatment in launch script, too
FxKu Sep 12, 2024
179c6c7
hour function
FxKu Sep 12, 2024
dd9c56a
shell script must compare to string
FxKu Sep 12, 2024
f3c026e
simplify how to configure hourly logging
FxKu Sep 12, 2024
dc98a22
set LOG_SHIP_HOURLY based on query in post_init script
FxKu Sep 12, 2024
637f090
add test for hourly log rotation
FxKu Sep 13, 2024
4c08136
fix shell script
FxKu Sep 13, 2024
624f3f4
change variable name
FxKu Sep 13, 2024
c04fd5f
once more place to compare string not bool
FxKu Sep 16, 2024
ee46fcd
make failed_authentication reflect log files rotation
FxKu Sep 16, 2024
5cd8d21
minor update to tests
FxKu Sep 16, 2024
b8c40fa
Update postgres-appliance/tests/test_spilo.sh
FxKu Sep 16, 2024
3017238
update hourly log test to check created views
FxKu Sep 16, 2024
3189e2b
Update postgres-appliance/tests/test_spilo.sh
FxKu Sep 16, 2024
c755a39
Update postgres-appliance/tests/test_spilo.sh
FxKu Sep 16, 2024
9e1af41
reflect feedback
FxKu Sep 16, 2024
8d68b3c
Merge branch 'hourly-logs' of github.com:FxKu/spilo into hourly-logs
FxKu Sep 16, 2024
9001bed
try with code from #955
FxKu Sep 17, 2024
e1ad37d
one quote too much
FxKu Sep 17, 2024
4c7bfe4
Merge branch 'master' into hourly-logs
FxKu Sep 17, 2024
12734ee
Merge branch 'master' into hourly-logs
FxKu Sep 18, 2024
54cc324
Merge branch 'master' into hourly-logs
FxKu Sep 19, 2024
6f4dfad
resolve conflict
FxKu Sep 19, 2024
073df65
Merge branch 'hourly-logs' of github.com:FxKu/spilo into hourly-logs
FxKu Sep 19, 2024
f476cca
add LOG_SHIP_HOURLY to log env dir
FxKu Sep 19, 2024
8b4ced7
adding hour padding of 0 to align with Postgres numbering
FxKu Sep 20, 2024
254fa9d
still provide daily log and auth views
FxKu Sep 20, 2024
3b82cea
drop existing foreign tables cascadingly when switching to hourly format
FxKu Sep 20, 2024
77ff352
update TS7 test
FxKu Sep 20, 2024
3539966
have to drop views to create foreign table
FxKu Sep 20, 2024
1107ccc
add dynamic DO block to delete postgres_log object
FxKu Sep 21, 2024
897d7fd
cannot cast regclass if relation does not exist
FxKu Sep 21, 2024
43502da
ok = was correct
FxKu Sep 21, 2024
83af8c5
add NULL check
FxKu Sep 21, 2024
ead6e0d
only accept true as value for LOG_SHIP_HOURLY
FxKu Sep 23, 2024
95f2742
move new check into get_placeholders
FxKu Sep 23, 2024
d216caf
Slightly rework log tbls cleanup
hughcapet Sep 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion ENVIRONMENT.rst
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,8 @@ Environment Configuration Settings
- **CALLBACK_SCRIPT**: the callback script to run on various cluster actions (on start, on stop, on restart, on role change). The script will receive the cluster name, connection string and the current action. See `Patroni <http://patroni.readthedocs.io/en/latest/SETTINGS.html?highlight=callback#postgresql>`__ documentation for details.
- **LOG_S3_BUCKET**: path to the S3 bucket used for PostgreSQL daily log files (i.e. foobar, without `s3://` prefix). Spilo will add `/spilo/{LOG_BUCKET_SCOPE_PREFIX}{SCOPE}{LOG_BUCKET_SCOPE_SUFFIX}/log/` to that path. Logs are shipped if this variable is set.
- **LOG_S3_TAGS**: map of key value pairs to be used for tagging files uploaded to S3. Values should be referencing existing environment variables e.g. ``{"ClusterName": "SCOPE", "Namespace": "POD_NAMESPACE"}``
- **LOG_SHIP_SCHEDULE**: cron schedule for shipping compressed logs from ``pg_log`` (if this feature is enabled, '00 02 * * *' by default)
- **LOG_SHIP_HOURLY**: if true, log rotation in Postgres is set to 1h incl. foreign tables for every hour (schedule `1 */1 * * *`)
- **LOG_SHIP_SCHEDULE**: cron schedule for shipping compressed logs from ``pg_log`` (``1 0 * * *`` by default)
- **LOG_ENV_DIR**: directory to store environment variables necessary for log shipping
- **LOG_TMPDIR**: directory to store temporary compressed daily log files. PGROOT/../tmp by default.
- **LOG_S3_ENDPOINT**: (optional) S3 Endpoint to use with Boto3
Expand Down
2 changes: 1 addition & 1 deletion postgres-appliance/bootstrap/maybe_pg_upgrade.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
#!/usr/bin/env python
import glob
import logging
import os
import subprocess
import sys
import glob

logger = logging.getLogger(__name__)

Expand Down
12 changes: 10 additions & 2 deletions postgres-appliance/launch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,16 @@ fi
## Ensure all logfiles exist, most appliances will have
## a foreign data wrapper pointing to these files
for i in $(seq 0 7); do
if [ ! -f "${PGLOG}/postgresql-$i.csv" ]; then
touch "${PGLOG}/postgresql-$i.csv"
if [ "$LOG_SHIP_HOURLY" != "true" ]; then
if [ ! -f "${PGLOG}/postgresql-${i}.csv" ]; then
touch "${PGLOG}/postgresql-${i}.csv"
fi
else
for h in $(seq 0 23); do
if [ ! -f "${PGLOG}/postgresql-${i}-${h}.csv" ]; then
touch "${PGLOG}/postgresql-${i}-${h}.csv"
fi
done
fi
done
chown -R postgres: "$PGROOT" "$RW_DIR/certs"
Expand Down
22 changes: 19 additions & 3 deletions postgres-appliance/scripts/configure_spilo.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
import pystache
import requests

from spilo_commons import RW_DIR, PATRONI_CONFIG_FILE, append_extensions,\
from spilo_commons import RW_DIR, PATRONI_CONFIG_FILE, append_extensions, \
get_binary_version, get_bin_dir, is_valid_pg_version, write_file, write_patroni_config


Expand Down Expand Up @@ -292,9 +292,19 @@ def deep_update(a, b):
logging_collector: 'on'
log_destination: csvlog
log_directory: ../pg_log
{{#LOG_SHIP_HOURLY}}
log_filename: 'postgresql-%u-%H.log'
{{/LOG_SHIP_HOURLY}}
{{^LOG_SHIP_HOURLY}}
log_filename: 'postgresql-%u.log'
{{/LOG_SHIP_HOURLY}}
log_file_mode: '0644'
{{#LOG_SHIP_HOURLY}}
log_rotation_age: '1h'
{{/LOG_SHIP_HOURLY}}
{{^LOG_SHIP_HOURLY}}
log_rotation_age: '1d'
{{/LOG_SHIP_HOURLY}}
log_truncate_on_rotation: 'on'
ssl: 'on'
{{#SSL_CA_FILE}}
Expand Down Expand Up @@ -580,6 +590,7 @@ def get_placeholders(provider):
placeholders.setdefault('CLONE_TARGET_INCLUSIVE', True)

placeholders.setdefault('LOG_GROUP_BY_DATE', False)
placeholders.setdefault('LOG_SHIP_HOURLY', '')
placeholders.setdefault('LOG_SHIP_SCHEDULE', '1 0 * * *')
placeholders.setdefault('LOG_S3_BUCKET', '')
placeholders.setdefault('LOG_S3_ENDPOINT', '')
Expand Down Expand Up @@ -773,6 +784,7 @@ def write_log_environment(placeholders):
os.makedirs(log_env['LOG_ENV_DIR'])

for var in ('LOG_TMPDIR',
'LOG_SHIP_HOURLY',
'LOG_AWS_REGION',
'LOG_S3_ENDPOINT',
'LOG_S3_KEY',
Expand Down Expand Up @@ -1008,8 +1020,12 @@ def write_crontab(placeholders, overwrite):
' "{PGDATA}"').format(**placeholders)]

if bool(placeholders.get('LOG_S3_BUCKET')):
lines += [('{LOG_SHIP_SCHEDULE} nice -n 5 envdir "{LOG_ENV_DIR}"' +
' /scripts/upload_pg_log_to_s3.py').format(**placeholders)]
log_dir = placeholders.get('LOG_ENV_DIR')
schedule = placeholders.get('LOG_SHIP_SCHEDULE')
if placeholders.get('LOG_SHIP_HOURLY') == 'true':
schedule = '1 */1 * * *'
lines += [('{0} nice -n 5 envdir "{1}"' +
' /scripts/upload_pg_log_to_s3.py').format(schedule, log_dir)]

lines += yaml.safe_load(placeholders['CRONTAB'])

Expand Down
40 changes: 28 additions & 12 deletions postgres-appliance/scripts/post_init.sh
Original file line number Diff line number Diff line change
Expand Up @@ -147,19 +147,35 @@ if [ "$PGVER" -ge 14 ]; then
fi

# Sunday could be 0 or 7 depending on the format, we just create both
LOG_SHIP_HOURLY=$(echo -e "SELECT text(current_setting('log_rotation_age') = '1h')" | psql -tAX -d postgres 2> /dev/null | tail -n 1)
for i in $(seq 0 7); do
echo "CREATE FOREIGN TABLE IF NOT EXISTS public.postgres_log_$i () INHERITS (public.postgres_log) SERVER pglog
OPTIONS (filename '../pg_log/postgresql-$i.csv', format 'csv', header 'false');
GRANT SELECT ON public.postgres_log_$i TO admin;

CREATE OR REPLACE VIEW public.failed_authentication_$i WITH (security_barrier) AS
SELECT *
FROM public.postgres_log_$i
WHERE command_tag = 'authentication'
AND error_severity = 'FATAL';
ALTER VIEW public.failed_authentication_$i OWNER TO postgres;
GRANT SELECT ON TABLE public.failed_authentication_$i TO robot_zmon;
"
if [ "$LOG_SHIP_HOURLY" != "true" ]; then
echo "CREATE FOREIGN TABLE IF NOT EXISTS public.postgres_log_${i} () INHERITS (public.postgres_log) SERVER pglog
OPTIONS (filename '../pg_log/postgresql-${i}.csv', format 'csv', header 'false');
GRANT SELECT ON public.postgres_log_${i} TO admin;

CREATE OR REPLACE VIEW public.failed_authentication_${i} WITH (security_barrier) AS
SELECT *
FROM public.postgres_log_${i}
WHERE command_tag = 'authentication'
AND error_severity = 'FATAL';
ALTER VIEW public.failed_authentication_${i} OWNER TO postgres;
GRANT SELECT ON TABLE public.failed_authentication_${i} TO robot_zmon;"
else
for h in $(seq 0 23); do
echo "CREATE FOREIGN TABLE IF NOT EXISTS public.postgres_log_${i}_${h} () INHERITS (public.postgres_log) SERVER pglog
OPTIONS (filename '../pg_log/postgresql-${i}-${h}.csv', format 'csv', header 'false');
GRANT SELECT ON public.postgres_log_${i}_${h} TO admin;

CREATE OR REPLACE VIEW public.failed_authentication_${i}_${h} WITH (security_barrier) AS
SELECT *
FROM public.postgres_log_${i}_${h}
FxKu marked this conversation as resolved.
Show resolved Hide resolved
WHERE command_tag = 'authentication'
AND error_severity = 'FATAL';
ALTER VIEW public.failed_authentication_${i}_${h} OWNER TO postgres;
GRANT SELECT ON TABLE public.failed_authentication_${i}_${h} TO robot_zmon;"
done
fi
done

cat _zmon_schema.dump
Expand Down
22 changes: 16 additions & 6 deletions postgres-appliance/scripts/upload_pg_log_to_s3.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,25 @@
logger = logging.getLogger(__name__)


def compress_pg_log():
yesterday = datetime.now() - timedelta(days=1)
yesterday_day_number = yesterday.strftime('%u')
def get_file_names():
prev_interval = datetime.now() - timedelta(days=1)
prev_interval_number = prev_interval.strftime('%u')

if os.getenv('LOG_SHIP_HOURLY') == 'true':
prev_interval = datetime.now() - timedelta(hours=1)
prev_interval_number = prev_interval.strftime('%u-%H')

log_file = os.path.join(os.getenv('PGLOG'), 'postgresql-' + prev_interval_number + '.csv')
archived_log_file = os.path.join(os.getenv('LOG_TMPDIR'), prev_interval.strftime('%F-%H') + '.csv.gz')

log_file = os.path.join(os.getenv('PGLOG'), 'postgresql-' + yesterday_day_number + '.csv')
archived_log_file = os.path.join(os.getenv('LOG_TMPDIR'), yesterday.strftime('%F') + '.csv.gz')
return log_file, archived_log_file


def compress_pg_log():
log_file, archived_log_file = get_file_names()

if os.path.getsize(log_file) == 0:
logger.warning("Postgres log from yesterday '%s' is empty.", log_file)
logger.warning("Postgres log '%s' is empty.", log_file)
sys.exit(0)

try:
Expand Down
34 changes: 33 additions & 1 deletion postgres-appliance/tests/test_spilo.sh
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,19 @@ function start_clone_with_basebackup_upgrade_container() {
-d spilo3
}

function start_clone_with_hourly_log_rotation() {
docker-compose run \
-e SCOPE=hourlylogs \
-e PGVERSION=16 \
-e LOG_SHIP_HOURLY="true" \
-e CLONE_SCOPE=upgrade2 \
-e CLONE_PGVERSION=14 \
-e CLONE_METHOD=CLONE_WITH_WALE \
-e CLONE_TARGET_TIME="$(next_minute)" \
--name "${PREFIX}hourlylogs" \
-d "spilo3"
}

function verify_clone_upgrade() {
local type=$2
local from_version=$3
Expand All @@ -241,13 +254,23 @@ function verify_archive_mode_is_on() {
[ "$archive_mode" = "on" ]
}

function verify_hourly_log_rotation() {
log_rotation_age=$(docker_exec "$1" "psql -U postgres -tAc \"SHOW log_rotation_age\"")
log_filename=$(docker_exec "$1" "psql -U postgres -tAc \"SHOW log_filename\"")
# we expect 8x24 foreign tables (+8 already existing tables when init with daily rotation)
postgres_log_ftables=$(docker_exec "$1" "psql -U postgres -tAc \"SELECT count(*) FROM pg_foreign_table WHERE ftrelid::regclass::text LIKE 'postgres_log_%'\"")
postgres_failed_auth_views=$(docker_exec "$1" "psql -U postgres -tAc \"SELECT count(*) FROM pg_views WHERE viewname LIKE 'failed_authentication_%'\"")

[ "$log_rotation_age" = "1h" ] && [ "$log_filename" = "postgresql-%u-%H.log" ] && [ "$postgres_log_ftables" -ge 192 ] && [ "$postgres_failed_auth_views" -ge 192 ]
}

# TEST SUITE 1 - In-place major upgrade 12->13->...->16
# TEST SUITE 2 - Major upgrade 12->16 after wal-e clone (with CLONE_PGVERSION set)
# TEST SUITE 3 - PITR (clone with wal-e) with unreachable target (13+)
# TEST SUITE 4 - Major upgrade 12->13 after wal-e clone (no CLONE_PGVERSION)
# TEST SUITE 5 - Replica bootstrap with wal-e
# TEST SUITE 6 - Major upgrade 13->14 after clone with basebackup
# TEST SUITE 7 - Hourly log rotation
function test_spilo() {
# TEST SUITE 1
local container=$1
Expand Down Expand Up @@ -344,7 +367,7 @@ function test_spilo() {
local basebackup_container
basebackup_container=$(start_clone_with_basebackup_upgrade_container "$upgrade_container") # SCOPE=upgrade2 PGVERSION=14 CLONE: _SCOPE=upgrade
log_info "[TS6] Started $basebackup_container for testing major upgrade 13->14 after clone with basebackup"
FxKu marked this conversation as resolved.
Show resolved Hide resolved

wait_backup "$basebackup_container"

# TEST SUITE 1
# run_test test_pg_upgrade_to_16_check_failed "$container" # pg_upgrade --check complains about timescaledb
Expand All @@ -362,11 +385,20 @@ function test_spilo() {
log_info "[TS5] Waiting for postgres to start in the $upgrade_replica_container and stream from primary..."
wait_all_streaming "$upgrade_container" 1

# TEST SUITE 7
local hourlylogs_container
hourlylogs_container=$(start_clone_with_hourly_log_rotation "$upgrade_container")
log_info "[TS7] Started $hourlylogs_container for testing hourly log rotation"

# TEST SUITE 6
log_info "[TS6] Testing in-place major upgrade 13->14 after clone with basebackup"
run_test verify_clone_upgrade "$basebackup_container" "basebackup" 13 14
run_test verify_archive_mode_is_on "$basebackup_container"

# TEST SUITE 7
find_leader "$hourlylogs_container"
log_info "[TS7] Testing correct setup with hourly log rotation"
run_test verify_hourly_log_rotation "$hourlylogs_container"
}

function main() {
Expand Down
Loading