fix: cherry-pick better local dev for ssh from gpdb close #746 #747

yihong0618 · 2024-12-03T10:26:10Z

Fixes #746

What does this PR do?

This PR cherry-pick three commits from gpdb for better local dev.

commit1: add start_new_session
commit2: do not need to ssh if developing cloudberry/gpdb in local env

commit3: base from coomit1

for commit2 benefit: as the test before, make test local can also speed up.

Type of Change

Bug fix (non-breaking change)
New feature (non-breaking change)
Breaking change (fix or feature with breaking changes)
Documentation update

Breaking Changes

Test Plan

Unit tests added/updated
Integration tests added/updated
Passed make installcheck
Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Followed contribution guide
Added/updated documentation
Reviewed code for security implications
Requested review from cloudberry committers

Additional Context

This comment should now re-trigger a CI run. The previous skip ci has been removed.

avamingli · 2024-12-03T10:38:23Z

@yihong0618 Thanks for doing this.

@edespino CI jobs succeed immediately?

edespino · 2024-12-03T10:41:34Z

@avamingli - Woah .. that is too fast!

edespino · 2024-12-03T10:43:05Z

It is probably due to the PR template. It has:

⚠️ To skip CI: Add [skip ci] to your PR title. Only use when necessary! ⚠️

yihong0618 · 2024-12-03T10:44:14Z

It is probably due to the PR template. It has:

⚠️ To skip CI: Add [skip ci] to your PR title. Only use when necessary! ⚠️

copy, will remember that next time

edespino · 2024-12-03T10:51:17Z

Sorry about that ... I will need to submit a workflow PR tomorrow to only allow a skip in the title.

leborchuk · 2024-12-03T12:26:37Z

Sorry, could you explain how start_new_session is used? I see the new parameter was added in 0aad1ca but it has the False value everywhere, how it is used?

yihong0618 · 2024-12-04T02:21:14Z

Sorry, could you explain how start_new_session is used? I see the new parameter was added in 0aad1ca but it has the False value everywhere, how it is used?

seems its not a necessary arg, but cherry-pick this commit to sync with upstream code is better

avamingli · 2024-12-04T02:53:58Z

Sorry, could you explain how start_new_session is used? I see the new parameter was added in 0aad1ca but it has the False value everywhere, how it is used?

seems its not a necessary arg, but cherry-pick this commit to sync with upstream code is better

Do we miss some commits around that param?

avamingli · 2024-12-04T08:15:13Z

Sorry, could you explain how start_new_session is used? I see the new parameter was added in 0aad1ca but it has the False value everywhere, how it is used?

seems its not a necessary arg, but cherry-pick this commit to sync with upstream code is better

Do we miss some commits around that param?

Have a search, the param is true in commit avamingli/gpdb@01bbcb6
which has not been cherry-picked yet.

yihong0618 · 2024-12-04T09:11:02Z

Sorry, could you explain how start_new_session is used? I see the new parameter was added in 0aad1ca but it has the False value everywhere, how it is used?

seems its not a necessary arg, but cherry-pick this commit to sync with upstream code is better

Do we miss some commits around that param?

Have a search, the param is true in commit avamingli/gpdb@01bbcb6 which has not been cherry-picked yet.

Thanks will cherry-pick it too

avamingli · 2024-12-04T10:15:39Z

Hi @yihong0618 , thanks for your help again.

I see commit: 0aad1ca has the same commit title and message with 12d8566 but different codes.

I guess GPDB has only one commit with that title and message, is there sth wrong?

yihong0618 · 2024-12-04T10:18:47Z

Hi @yihong0618 , thanks for your help again.

I see commit: 0aad1ca has the same commit title and message with 12d8566 but different codes.

I guess GPDB has only one commit with that title and message, is there sth wrong?

will try to figure maybe this weekend thanks~

Currently, gprecoverseg lacks the ability to handle interrupts. When the main gprecoverseg process is terminated, it leaves behind orphaned `gpsegrecovery, pg_rewind, or pg_basebackup` processes that the user would have to manually terminate. This commit aims to add signal handling capabilities to the gprecoverseg code so that we can gracefully terminate the gprecoverseg process by using appropriate signal handlers and performing necessary cleanup actions, instead of abruptly terminating only the main process. The overall process of termination using signal handlers is described below: - Signals are caught by the signal handlers in the `gprecoverseg` code. This signal handler then gets the PIDs of the `gpsegsetuprecovery/gpsegrecovery` process running on each segment host and sends a `SIGTERM` signal to them. - The `gpsegrecovery` process also has a signal handler that will be used to handle the `SIGTERM` signal sent by the main process. We do not need any signal handler for the `gpsegsetuprecovery` code as it does not spawn any new process and we can directly terminate it. - The signal handler in the `gpsegrecovery` code would then call `terminate_proc_tree` which would terminate the entire process tree of `gpsegrecovery` process, since it is the parent PID of all the child processes (either `pg_rewind/pg_basebackup/rsync`). We do not terminate the `gpsegrecovery` process here since we would need it to perform proper error handling and cleanup actions. - The above steps are repeated unitl the `WorkerPool` is done with its work. This is to make sure that we are not leaving behind any unfinished commands that might be waiting in the `WorkerPool` queue. - Termination of the `pg_rewind/pg_basebackup/rsync` process will raise an error, which will be propagated back to the user, and it would also trigger the necessary cleanup actions that are already in place whenever a failure occurs. The behavior of gprecoverseg on different termination scenarios will be as follows: 1. **Using CTRL-C**: This sends a `SIGINT` signal, and upon receiving this, the user would be given a prompt if they want to proceed further with the termination. ``` It is not recommended to terminate a recovery procedure midway. However, if you choose to proceed, you will need to run either gprecoverseg --differential or gprecoverseg -F to start a new recovery process the next time. Continue terminating gprecoverseg Yy|Nn (default=N): > ``` 2. **Using kill from the terminal**: This, by default, sends a `SIGTERM` signal, and upon receiving this signal, gprecoverseg would directly terminate without any confirmation from the user. 3. **SSH disconnections**: This sends a `SIGHUP` signal, and currently, this signal will be ignored as we do not want to terminate the recovery process due to such issues. The signal handlers are placed just before the recovery process is started so that it does not affect any other functionality of gprecoverseg (like rebalancing).

This commit adds a new parameter `start_new_session` to the Command class. When set to `True`, it spawns the processes created by the Popen object in a separate session from the parent process. This can be useful in cases where you want to affect only the parent process without affecting the child processes, such as sending a termination signal. By default the parameter is set to `False`.

…812) We can skip ssh connections if segments are on the same host with the coordinator. We can save some time when executing commands. E.g., with this patch, we don't ssh connection when creating the demo cluster and the creation time can be reduced from 18s to 10s on my laptop. Besides, we can skip setting up ssh keys for deploying the demo cluster.

yihong0618 · 2024-12-08T07:14:55Z

fixed

avamingli · 2024-12-08T07:18:47Z

fixed

Thanks for your help.

yihong0618 · 2024-12-08T07:57:09Z

failed diff

diff -I HINT: -I CONTEXT: -I GP_IGNORE: -U3 /__w/cloudberry/cloudberry/src/test/regress/expected/uao_compaction/full_stats.out /__w/cloudberry/cloudberry/src/test/regress/results/uao_compaction/full_stats.out
--- /__w/cloudberry/cloudberry/src/test/regress/expected/uao_compaction/full_stats.out	2024-12-07 23:38:36.283181352 -0800
+++ /__w/cloudberry/cloudberry/src/test/regress/results/uao_compaction/full_stats.out	2024-12-07 23:38:36.287181360 -0800
@@ -32,6 +32,6 @@
 SELECT relname, reltuples FROM pg_class WHERE relname = 'uao_full_stats_index';
        relname        | reltuples 
 ----------------------+-----------
- uao_full_stats_index |        85
+ uao_full_stats_index |       185
 (1 row)
 
diff -I HINT: -I CONTEXT: -I GP_IGNORE: -U3 /__w/cloudberry/cloudberry/src/test/regress/expected/uao_compaction/stats.out /__w/cloudberry/cloudberry/src/test/regress/results/uao_compaction/stats.out
--- /__w/cloudberry/cloudberry/src/test/regress/expected/uao_compaction/stats.out	2024-12-07 23:38:36.663182147 -0800
+++ /__w/cloudberry/cloudberry/src/test/regress/results/uao_compaction/stats.out	2024-12-07 23:38:36.667182155 -0800
@@ -38,7 +38,7 @@
 SELECT relname, reltuples FROM pg_class WHERE relname = 'uao_stats_index';
      relname     | reltuples 
 -----------------+-----------
- uao_stats_index |        88
+ uao_stats_index |       151
 (1 row)
 
 -- re-setup for next case

yjhjstz

LGTM

my-ship-it

LGTM

avamingli added the cherry-pick cherry-pick upstream commts label Dec 3, 2024

yihong0618 changed the title ~~Hy/cherry pick ssh~~ fix: cherry-pick better local dev for ssh from gpdb Dec 3, 2024

yihong0618 changed the title ~~fix: cherry-pick better local dev for ssh from gpdb~~ fix: cherry-pick better local dev for ssh from gpdb close #746 Dec 3, 2024

jnihal and others added 3 commits December 8, 2024 14:59

yihong0618 force-pushed the hy/cherry_pick_ssh branch from 77c7a28 to baedf1f Compare December 8, 2024 07:02

yjhjstz approved these changes Dec 9, 2024

View reviewed changes

my-ship-it approved these changes Dec 9, 2024

View reviewed changes

my-ship-it merged commit cd4a4d2 into apache:main Dec 9, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: cherry-pick better local dev for ssh from gpdb close #746 #747

fix: cherry-pick better local dev for ssh from gpdb close #746 #747

yihong0618 commented Dec 3, 2024 •

edited

Loading

avamingli commented Dec 3, 2024

edespino commented Dec 3, 2024

edespino commented Dec 3, 2024

yihong0618 commented Dec 3, 2024

edespino commented Dec 3, 2024

leborchuk commented Dec 3, 2024

yihong0618 commented Dec 4, 2024

avamingli commented Dec 4, 2024

avamingli commented Dec 4, 2024

yihong0618 commented Dec 4, 2024

avamingli commented Dec 4, 2024

yihong0618 commented Dec 4, 2024

yihong0618 commented Dec 8, 2024

avamingli commented Dec 8, 2024

yihong0618 commented Dec 8, 2024

yjhjstz left a comment

my-ship-it left a comment

fix: cherry-pick better local dev for ssh from gpdb close #746 #747

fix: cherry-pick better local dev for ssh from gpdb close #746 #747

Conversation

yihong0618 commented Dec 3, 2024 • edited Loading

What does this PR do?

Type of Change

Breaking Changes

Test Plan

Impact

Checklist

Additional Context

avamingli commented Dec 3, 2024

edespino commented Dec 3, 2024

edespino commented Dec 3, 2024

yihong0618 commented Dec 3, 2024

edespino commented Dec 3, 2024

leborchuk commented Dec 3, 2024

yihong0618 commented Dec 4, 2024

avamingli commented Dec 4, 2024

avamingli commented Dec 4, 2024

yihong0618 commented Dec 4, 2024

avamingli commented Dec 4, 2024

yihong0618 commented Dec 4, 2024

yihong0618 commented Dec 8, 2024

avamingli commented Dec 8, 2024

yihong0618 commented Dec 8, 2024

yjhjstz left a comment

Choose a reason for hiding this comment

my-ship-it left a comment

Choose a reason for hiding this comment

yihong0618 commented Dec 3, 2024 •

edited

Loading