Fix a idempotency issues in CreateVolume #99

WanzenBug · 2020-12-02T15:34:33Z

In case creating a new volume takes longer than the GRPC timeout,
the provisioner could get in a state where volumes are considered
ready even if no resource was ever created. This commit fixes this
issue by ensuring that volumes are only considered ready after the
expected amount of volumes was placed.

The bug happened in cases where:

Create() succesfully called saveVolume(), persisting the volume
information as annotations, which is interpreted as "ready" resources
Immediatly after, the GRPC timeout cancels the request context, meaning
the volume scheduler aborted before any resources could be assigned to
nodes

A simple reording of volumeScheduler.Create() and saveVolume() would
lead to a different issue, in which continuously more volumes are placed
but never marked as ready. To prevent this, volumes that reference at least
the number of resources as required by the parameters are considered ready.

Note that this is a stopgap solution until volume schedulers are re-written
to be idempotent themselves.

In case creating a new volume takes longer than the GRPC timeout, the provisioner could get in a state where volumes are considered ready even if no resource was ever created. This commit fixes this issue by ensuring that volumes are only considered ready after the expected amount of volumes was placed. The bug happened in cases where: * Create() succesfully called saveVolume(), persisting the volume information as annotations, which is interpreted as "ready" resources * Immediatly after, the GRPC timeout cancels the request context, meaning the volume scheduler aborted before any resources could be assigned to nodes A simple reording of "volumeScheduler.Create()" and "saveVolume()" would lead to a different issue, in which continuously more volumes are placed but never marked as ready. To prevent this, volumes that reference at least the number of resources as required by the parameters are considered ready. Note that this is a stopgap solution until volume schedulers are re-written to be idempotent themselves.

rck · 2020-12-03T11:53:41Z

LGTM. thanks

WanzenBug force-pushed the fix-retry-in-create-volume branch from a9fd0de to 30c3396 Compare December 2, 2020 15:35

WanzenBug force-pushed the fix-retry-in-create-volume branch from 30c3396 to f4eeb37 Compare December 2, 2020 16:46

WanzenBug requested a review from rck December 2, 2020 16:53

WanzenBug marked this pull request as ready for review December 2, 2020 16:53

rck merged commit c72c581 into piraeusdatastore:master Dec 3, 2020

WanzenBug deleted the fix-retry-in-create-volume branch December 3, 2020 11:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a idempotency issues in CreateVolume #99

Fix a idempotency issues in CreateVolume #99

WanzenBug commented Dec 2, 2020

rck commented Dec 3, 2020

Fix a idempotency issues in CreateVolume #99

Fix a idempotency issues in CreateVolume #99

Conversation

WanzenBug commented Dec 2, 2020

rck commented Dec 3, 2020