Data deletion sometimes fails and there is no easy way to retry #430

yanokwa · 2023-05-16T20:40:15Z

In our testing, data deletions always worked, but in every production install I've upgraded, data deletion has failed and I have no idea why. It'd be nice to have a force that runs https://github.com/getodk/central/blob/master/files/postgres/upgrade-postgres.sh#L42.

root@ip-xx-xx-xx-xx:~/central# docker compose up postgres
[+] Running 1/1
 ⠿ Container central-postgres-1  Recreated                                                                                                                 0.1s
Attaching to central-postgres-1
central-postgres-1  | Tue 16 May 2023 08:32:00 PM GMT [upgrade-postgres.sh] Checking for existing upgrade marker file...
central-postgres-1  | Tue 16 May 2023 08:32:00 PM GMT [upgrade-postgres.sh] Upgrade has been run previously.
central-postgres-1  | Tue 16 May 2023 08:32:00 PM GMT [upgrade-postgres.sh] !!!
central-postgres-1  | Tue 16 May 2023 08:32:00 PM GMT [upgrade-postgres.sh] !!! WARNING: you still have old data from PostgreSQL 9.6
central-postgres-1  | Tue 16 May 2023 08:32:00 PM GMT [upgrade-postgres.sh] !!!
central-postgres-1  | Tue 16 May 2023 08:32:00 PM GMT [upgrade-postgres.sh] !!! This is taking up disk space: 311MB
central-postgres-1  | Tue 16 May 2023 08:32:00 PM GMT [upgrade-postgres.sh] !!!
central-postgres-1  | Tue 16 May 2023 08:32:00 PM GMT [upgrade-postgres.sh] !!! Continue with the instructions at https://docs.getodk.org/central-upgrade/
central-postgres-1  | Tue 16 May 2023 08:32:00 PM GMT [upgrade-postgres.sh] !!!
central-postgres-1  | Tue 16 May 2023 08:32:00 PM GMT [upgrade-postgres.sh] Complete.
central-postgres-1 exited with code 0

root@ip-172-31-18-74:~/central# touch ./files/postgres14/upgrade/delete-old-data \
   && docker compose up --abort-on-container-exit postgres
[+] Running 1/1
 ⠿ Container central-postgres-1  Recreated                                                                                                                 0.1s
Attaching to central-postgres-1
central-postgres-1  | Tue 16 May 2023 08:32:08 PM GMT [upgrade-postgres.sh] Checking for existing upgrade marker file...
central-postgres-1  | Tue 16 May 2023 08:32:08 PM GMT [upgrade-postgres.sh] Upgrade has been run previously.
central-postgres-1  | Tue 16 May 2023 08:32:08 PM GMT [upgrade-postgres.sh] Deleting old data...
central-postgres-1  | Tue 16 May 2023 08:32:08 PM GMT [upgrade-postgres.sh] !!!
central-postgres-1  | Tue 16 May 2023 08:32:08 PM GMT [upgrade-postgres.sh] !!! ERROR: file missing: delete_old_cluster.sh
central-postgres-1  | Tue 16 May 2023 08:32:08 PM GMT [upgrade-postgres.sh] !!!
central-postgres-1  | Tue 16 May 2023 08:32:08 PM GMT [upgrade-postgres.sh] !!! Upgrade may not have completed successfully.
central-postgres-1  | Tue 16 May 2023 08:32:08 PM GMT [upgrade-postgres.sh] !!!
central-postgres-1  | Tue 16 May 2023 08:32:08 PM GMT [upgrade-postgres.sh] !!! Old data will not be deleted.
central-postgres-1  | Tue 16 May 2023 08:32:08 PM GMT [upgrade-postgres.sh] !!!
central-postgres-1 exited with code 1
Aborting on container exit...
[+] Running 1/0
 ⠿ Container central-postgres-1  Stopped

In this case, I can log into Central fine, I can also run docker exec -it central-postgres14-1 psql -U odk -W odk -c "select version();" to confirm I'm running 14.7.

The logs from a Central migration that where delete worked looks ends like this:

Tue 16 May 2023 08:27:38 PM GMT [upgrade-postgres.sh] Upgrade complete.
Tue 16 May 2023 08:27:38 PM GMT [upgrade-postgres.sh] Complete.

The logs where the delete failed looks like this:

Sat 25 Mar 2023 05:28:40 AM GMT [upgrade-postgres.sh] Upgrade complete.

It seems like maybe delete_old_cluster.sh is somehow not available? And maybe it doesn't matter? Rather than check delete_old_cluster.sh, maybe if /postgres14-upgrade/upgrade-successful exists, we do the delete.

tianon/docker-postgres-upgrade#11 (comment) seems to suggest that the delete_old_cluster.sh is created in a work directory in the container which is not persisted and might go away.

It seems like the best solution is to check to see if the upgrade has completed and delete the data then.

The text was updated successfully, but these errors were encountered:

matthew-white · 2023-05-24T03:16:40Z

Closed by #431.

yanokwa changed the title ~~Data deletion sometimes fails and there is no easy way to retry.~~ Data deletion sometimes fails and there is no easy way to retry May 16, 2023

yanokwa mentioned this issue May 16, 2023

Remove unreliable check for successful migration #431

Merged

matthew-white added this to ODK Central May 17, 2023

github-project-automation bot moved this to 🕒 backlog in ODK Central May 17, 2023

matthew-white assigned yanokwa May 17, 2023

matthew-white moved this from 🕒 backlog to ✏️ in progress in ODK Central May 17, 2023

matthew-white added the ops Docker, nginx, ops to deploy Central label May 24, 2023

matthew-white closed this as completed May 24, 2023

github-project-automation bot moved this from ✏️ in progress to ✅ done in ODK Central May 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data deletion sometimes fails and there is no easy way to retry #430

Data deletion sometimes fails and there is no easy way to retry #430

yanokwa commented May 16, 2023 •

edited

Loading

matthew-white commented May 24, 2023

Data deletion sometimes fails and there is no easy way to retry #430

Data deletion sometimes fails and there is no easy way to retry #430

Comments

yanokwa commented May 16, 2023 • edited Loading

matthew-white commented May 24, 2023

yanokwa commented May 16, 2023 •

edited

Loading