Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a idempotency issues in CreateVolume #99

Merged
merged 1 commit into from
Dec 3, 2020

Conversation

WanzenBug
Copy link
Member

In case creating a new volume takes longer than the GRPC timeout,
the provisioner could get in a state where volumes are considered
ready even if no resource was ever created. This commit fixes this
issue by ensuring that volumes are only considered ready after the
expected amount of volumes was placed.

The bug happened in cases where:

  • Create() succesfully called saveVolume(), persisting the volume
    information as annotations, which is interpreted as "ready" resources
  • Immediatly after, the GRPC timeout cancels the request context, meaning
    the volume scheduler aborted before any resources could be assigned to
    nodes

A simple reording of volumeScheduler.Create() and saveVolume() would
lead to a different issue, in which continuously more volumes are placed
but never marked as ready. To prevent this, volumes that reference at least
the number of resources as required by the parameters are considered ready.

Note that this is a stopgap solution until volume schedulers are re-written
to be idempotent themselves.

In case creating a new volume takes longer than the GRPC timeout,
the provisioner could get in a state where volumes are considered
ready even if no resource was ever created. This commit fixes this
issue by ensuring that volumes are only considered ready after the
expected amount of volumes was placed.

The bug happened in cases where:
* Create() succesfully called saveVolume(), persisting the volume
  information as annotations, which is interpreted as "ready" resources
* Immediatly after, the GRPC timeout cancels the request context, meaning
  the volume scheduler aborted before any resources could be assigned to
  nodes

A simple reording of "volumeScheduler.Create()" and "saveVolume()" would
lead to a different issue, in which continuously more volumes are placed
but never marked as ready. To prevent this, volumes that reference at least
the number of resources as required by the parameters are considered ready.

Note that this is a stopgap solution until volume schedulers are re-written
to be idempotent themselves.
@WanzenBug WanzenBug requested a review from rck December 2, 2020 16:53
@WanzenBug WanzenBug marked this pull request as ready for review December 2, 2020 16:53
@rck
Copy link
Member

rck commented Dec 3, 2020

LGTM. thanks

@rck rck merged commit c72c581 into piraeusdatastore:master Dec 3, 2020
@WanzenBug WanzenBug deleted the fix-retry-in-create-volume branch December 3, 2020 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants