Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regression for pool creation timeout retry #887

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

tiagolobocastro
Copy link
Contributor

test(pool): create on very large or very slow disks

Uses LVM Lvols as backend devices for the pool.
We suspend these before pool creation, allowing us to simulate slow
pool creation.
This test ensures that the pool creation is completed by itself and also
that a client can also complete it by calling create again.

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

fix: allow pool creation to complete asynchronously

When the initial create gRPC times out, the data-plane may still be creating
the pool in the background, which can happen for very large pools.
Rather than assume failure, we allow this to complete in the background up to
a large arbitrary amount of time. If the pool creation completes before, then
we retry the creation flow.
The reason why we don't simply use very large timeouts is because the gRPC
operations are currently sequential, mostly due to historical reasons.
Now that the data-plane is allowing concurrent calls, we should also allow
this on the control-plane.
TODO: allow concurrent node operations

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

fix: check for correct not found error code

A previous fix ended up not working correctly because it was merged
incorrectly, somehow!

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

chore: update terraform node prep

Pull the Release key from a recent k8s version since the old keys are no
longer valid.
This will have to be updated from time to time.

Pull the Release key from a recent k8s version since the old keys are no
longer valid.
This will have to be updated from time to time.

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>
A previous fix ended up not working correctly because it was merged
incorrectly, somehow!

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>
@tiagolobocastro
Copy link
Contributor Author

Resolves openebs/mayastor#1772

When the initial create gRPC times out, the data-plane may still be creating
the pool in the background, which can happen for very large pools.
Rather than assume failure, we allow this to complete in the background up to
a large arbitrary amount of time. If the pool creation completes before, then
we retry the creation flow.
The reason why we don't simply use very large timeouts is because the gRPC
operations are currently sequential, mostly due to historical reasons.
Now that the data-plane is allowing concurrent calls, we should also allow
this on the control-plane.
TODO: allow concurrent node operations

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>
Uses LVM Lvols as backend devices for the pool.
We suspend these before pool creation, allowing us to simulate slow
pool creation.
This test ensures that the pool creation is completed by itself and also
that a client can also complete it by calling create again.

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants