Fix regression for pool creation timeout retry #887

tiagolobocastro · 2024-11-20T00:19:01Z

test(pool): create on very large or very slow disks

Uses LVM Lvols as backend devices for the pool.
We suspend these before pool creation, allowing us to simulate slow
pool creation.
This test ensures that the pool creation is completed by itself and also
that a client can also complete it by calling create again.

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

fix: allow pool creation to complete asynchronously

When the initial create gRPC times out, the data-plane may still be creating
the pool in the background, which can happen for very large pools.
Rather than assume failure, we allow this to complete in the background up to
a large arbitrary amount of time. If the pool creation completes before, then
we retry the creation flow.
The reason why we don't simply use very large timeouts is because the gRPC
operations are currently sequential, mostly due to historical reasons.
Now that the data-plane is allowing concurrent calls, we should also allow
this on the control-plane.
TODO: allow concurrent node operations

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

fix: check for correct not found error code

A previous fix ended up not working correctly because it was merged
incorrectly, somehow!

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

chore: update terraform node prep

Pull the Release key from a recent k8s version since the old keys are no
longer valid.
This will have to be updated from time to time.

Pull the Release key from a recent k8s version since the old keys are no longer valid. This will have to be updated from time to time. Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

A previous fix ended up not working correctly because it was merged incorrectly, somehow! Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

tiagolobocastro · 2024-11-20T00:19:13Z

Resolves openebs/mayastor#1772

control-plane/agents/src/bin/core/controller/resources/operations_helper.rs

When the initial create gRPC times out, the data-plane may still be creating the pool in the background, which can happen for very large pools. Rather than assume failure, we allow this to complete in the background up to a large arbitrary amount of time. If the pool creation completes before, then we retry the creation flow. The reason why we don't simply use very large timeouts is because the gRPC operations are currently sequential, mostly due to historical reasons. Now that the data-plane is allowing concurrent calls, we should also allow this on the control-plane. TODO: allow concurrent node operations Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

Uses LVM Lvols as backend devices for the pool. We suspend these before pool creation, allowing us to simulate slow pool creation. This test ensures that the pool creation is completed by itself and also that a client can also complete it by calling create again. Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

tiagolobocastro added 2 commits November 18, 2024 12:44

chore: update terraform node prep

8333853

Pull the Release key from a recent k8s version since the old keys are no longer valid. This will have to be updated from time to time. Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

fix: check for correct not found error code

8c2b226

A previous fix ended up not working correctly because it was merged incorrectly, somehow! Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

tiagolobocastro requested review from sinhaashish, Abhinandan-Purkait, abhilashshetty04 and dsharma-dc November 20, 2024 00:19

dsharma-dc approved these changes Nov 20, 2024

View reviewed changes

control-plane/agents/src/bin/core/controller/resources/operations_helper.rs Outdated Show resolved Hide resolved

tiagolobocastro force-pushed the pool-timeout branch from cc56335 to 2d71eb4 Compare November 21, 2024 12:04

tiagolobocastro force-pushed the pool-timeout branch from 2d71eb4 to 9607994 Compare November 21, 2024 12:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix regression for pool creation timeout retry #887

Fix regression for pool creation timeout retry #887

tiagolobocastro commented Nov 20, 2024

tiagolobocastro commented Nov 20, 2024

Fix regression for pool creation timeout retry #887

Are you sure you want to change the base?

Fix regression for pool creation timeout retry #887

Conversation

tiagolobocastro commented Nov 20, 2024

tiagolobocastro commented Nov 20, 2024