Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provider/scaleway: fix scaleway_volume_attachment with count > 1 #9493

Merged
merged 6 commits into from
Oct 27, 2016
Merged

provider/scaleway: fix scaleway_volume_attachment with count > 1 #9493

merged 6 commits into from
Oct 27, 2016

Conversation

nicolai86
Copy link
Contributor

@nicolai86 nicolai86 commented Oct 20, 2016

since scaleway requires servers to be powered off to attach volumes to we need
to make sure that we don't power down a server twice, or power up a server while
it's supposed to be modified. classic race condition.

sadly terraform doesn't seem to sport serialization primitives for use cases like
this, but putting the code in question behind a sync.Mutex does the trick, too.

this PR also migrates waitForServerState over to terraform's internal retry primitive.

fixes #9417

TF_ACC=1 go test ./builtin/providers/scaleway -v -run=TestAccScaleway -timeout 120m
=== RUN   TestAccScalewayDataSourceBootscript_Basic
--- PASS: TestAccScalewayDataSourceBootscript_Basic (1.46s)
=== RUN   TestAccScalewayDataSourceBootscript_Filtered
--- PASS: TestAccScalewayDataSourceBootscript_Filtered (1.16s)
=== RUN   TestAccScalewayDataSourceImage_Basic
--- PASS: TestAccScalewayDataSourceImage_Basic (7.22s)
=== RUN   TestAccScalewayDataSourceImage_Filtered
--- PASS: TestAccScalewayDataSourceImage_Filtered (7.17s)
=== RUN   TestAccScalewayIP_importBasic
--- PASS: TestAccScalewayIP_importBasic (1.95s)
=== RUN   TestAccScalewaySecurityGroup_importBasic
--- PASS: TestAccScalewaySecurityGroup_importBasic (2.35s)
=== RUN   TestAccScalewayServer_importBasic
--- PASS: TestAccScalewayServer_importBasic (138.48s)
=== RUN   TestAccScalewayVolume_importBasic
--- PASS: TestAccScalewayVolume_importBasic (3.49s)
=== RUN   TestAccScalewayIP_Basic
--- PASS: TestAccScalewayIP_Basic (10.79s)
=== RUN   TestAccScalewaySecurityGroupRule_Basic
--- PASS: TestAccScalewaySecurityGroupRule_Basic (6.57s)
=== RUN   TestAccScalewaySecurityGroup_Basic
--- PASS: TestAccScalewaySecurityGroup_Basic (2.52s)
=== RUN   TestAccScalewayServer_Basic
--- PASS: TestAccScalewayServer_Basic (129.39s)
=== RUN   TestAccScalewayServer_SecurityGroup
--- PASS: TestAccScalewayServer_SecurityGroup (166.02s)
=== RUN   TestAccScalewayVolumeAttachment_Basic
--- PASS: TestAccScalewayVolumeAttachment_Basic (373.03s)
=== RUN   TestAccScalewayVolume_Basic
--- PASS: TestAccScalewayVolume_Basic (3.27s)
PASS
ok      github.com/hashicorp/terraform/builtin/providers/scaleway   854.886s

@tboerger
Copy link
Contributor

I will give it a try tomorrow

@dcharbonnier
Copy link

What do you think about the azure implementation that define a MutexKV in the provider and lock ressources. It take advantage of the terraform official helper mutexkv, that could be more "elegant" don't you think @nicolai86 ?


var currentState string
func waitForServerState(scaleway *api.ScalewayAPI, serverID string, targetState string) error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@nicolai86
Copy link
Contributor Author

great find, @dcharbonnier . I'll adjust the PR

@nicolai86
Copy link
Contributor Author

tests still pass:

TF_ACC=1 go test ./builtin/providers/scaleway -v -run=TestAccScaleway -timeout 120m
=== RUN   TestAccScalewayDataSourceBootscript_Basic
--- PASS: TestAccScalewayDataSourceBootscript_Basic (1.38s)
=== RUN   TestAccScalewayDataSourceBootscript_Filtered
--- PASS: TestAccScalewayDataSourceBootscript_Filtered (1.20s)
=== RUN   TestAccScalewayDataSourceImage_Basic
--- PASS: TestAccScalewayDataSourceImage_Basic (7.30s)
=== RUN   TestAccScalewayDataSourceImage_Filtered
--- PASS: TestAccScalewayDataSourceImage_Filtered (7.37s)
=== RUN   TestAccScalewayIP_importBasic
--- PASS: TestAccScalewayIP_importBasic (2.24s)
=== RUN   TestAccScalewaySecurityGroup_importBasic
--- PASS: TestAccScalewaySecurityGroup_importBasic (2.38s)
=== RUN   TestAccScalewayServer_importBasic
--- PASS: TestAccScalewayServer_importBasic (116.65s)
=== RUN   TestAccScalewayVolume_importBasic
--- PASS: TestAccScalewayVolume_importBasic (3.26s)
=== RUN   TestAccScalewayIP_Basic
--- PASS: TestAccScalewayIP_Basic (11.16s)
=== RUN   TestAccScalewaySecurityGroupRule_Basic
--- PASS: TestAccScalewaySecurityGroupRule_Basic (5.02s)
=== RUN   TestAccScalewaySecurityGroup_Basic
--- PASS: TestAccScalewaySecurityGroup_Basic (2.48s)
=== RUN   TestAccScalewayServer_Basic
--- PASS: TestAccScalewayServer_Basic (161.38s)
=== RUN   TestAccScalewayServer_SecurityGroup
--- PASS: TestAccScalewayServer_SecurityGroup (302.07s)
=== RUN   TestAccScalewayVolumeAttachment_Basic
--- PASS: TestAccScalewayVolumeAttachment_Basic (448.10s)
PASS
ok      github.com/hashicorp/terraform/builtin/providers/scaleway

@tboerger
Copy link
Contributor

Is it possible that there is some kind of race while executing plan? It's hanging until I hit Ctrl+c, than it's printing a lot of output. But apply seem to work properly with that change, even if it takes some time to setup my cluster.

I'm just running into another issue from Scaleway where a node is hanging from time to time at starting server (kernel started).

@tboerger
Copy link
Contributor

I can confirm that this patch correctly fixes the attachment of multiple volumes. It took nearly 8 minutes to launch one server of the type C2S with 3 additional volumes:

# time terraform apply
data.scaleway_image.store: Refreshing state...
scaleway_server.store: Creating...
  enable_ipv6:  "" => "false"
  image:        "" => "047f1372-3923-471f-82ca-5ff69dbaf0f7"
  name:         "" => "store"
  private_ip:   "" => "<computed>"
  public_ip:    "" => "<computed>"
  state:        "" => "<computed>"
  state_detail: "" => "<computed>"
  type:         "" => "C2S"
scaleway_volume.store.2: Creating...
  name:       "" => "vol-2"
  server:     "" => "<computed>"
  size_in_gb: "" => "150"
  type:       "" => "l_ssd"
scaleway_volume.store.0: Creating...
  name:       "" => "vol-0"
  server:     "" => "<computed>"
  size_in_gb: "" => "150"
  type:       "" => "l_ssd"
scaleway_volume.store.1: Creating...
  name:       "" => "vol-1"
  server:     "" => "<computed>"
  size_in_gb: "" => "150"
  type:       "" => "l_ssd"
scaleway_volume.store.0: Creation complete
scaleway_volume.store.1: Creation complete
scaleway_volume.store.2: Creation complete
scaleway_server.store: Still creating... (10s elapsed)
scaleway_server.store: Still creating... (20s elapsed)
scaleway_server.store: Still creating... (30s elapsed)
scaleway_server.store: Still creating... (40s elapsed)
scaleway_server.store: Still creating... (50s elapsed)
scaleway_server.store: Still creating... (1m0s elapsed)
scaleway_server.store: Still creating... (1m10s elapsed)
scaleway_server.store: Still creating... (1m20s elapsed)
scaleway_server.store: Creation complete
scaleway_volume_attachment.store.0: Creating...
  server: "" => "c0186588-ecd2-44ad-a4de-2926d3654461"
  volume: "" => "0903698a-a952-4e54-bd83-797fe02a410d"
scaleway_volume_attachment.store.2: Creating...
  server: "" => "c0186588-ecd2-44ad-a4de-2926d3654461"
  volume: "" => "e67fc95c-cee8-4dba-9d61-453e4640151a"
scaleway_volume_attachment.store.1: Creating...
  server: "" => "c0186588-ecd2-44ad-a4de-2926d3654461"
  volume: "" => "9df7d61c-41b8-45be-b376-ce927ec10709"
scaleway_volume_attachment.store.0: Still creating... (10s elapsed)
scaleway_volume_attachment.store.2: Still creating... (10s elapsed)
scaleway_volume_attachment.store.1: Still creating... (10s elapsed)
scaleway_volume_attachment.store.0: Still creating... (20s elapsed)
scaleway_volume_attachment.store.2: Still creating... (20s elapsed)
scaleway_volume_attachment.store.1: Still creating... (20s elapsed)
scaleway_volume_attachment.store.0: Still creating... (30s elapsed)
scaleway_volume_attachment.store.2: Still creating... (30s elapsed)
scaleway_volume_attachment.store.1: Still creating... (30s elapsed)
scaleway_volume_attachment.store.2: Still creating... (40s elapsed)
scaleway_volume_attachment.store.0: Still creating... (40s elapsed)
scaleway_volume_attachment.store.1: Still creating... (40s elapsed)
scaleway_volume_attachment.store.2: Still creating... (50s elapsed)
scaleway_volume_attachment.store.1: Still creating... (50s elapsed)
scaleway_volume_attachment.store.0: Still creating... (50s elapsed)
scaleway_volume_attachment.store.0: Still creating... (1m0s elapsed)
scaleway_volume_attachment.store.2: Still creating... (1m0s elapsed)
scaleway_volume_attachment.store.1: Still creating... (1m0s elapsed)
scaleway_volume_attachment.store.2: Still creating... (1m10s elapsed)
scaleway_volume_attachment.store.0: Still creating... (1m10s elapsed)
scaleway_volume_attachment.store.1: Still creating... (1m10s elapsed)
scaleway_volume_attachment.store.1: Still creating... (1m20s elapsed)
scaleway_volume_attachment.store.2: Still creating... (1m20s elapsed)
scaleway_volume_attachment.store.0: Still creating... (1m20s elapsed)
scaleway_volume_attachment.store.1: Still creating... (1m30s elapsed)
scaleway_volume_attachment.store.2: Still creating... (1m30s elapsed)
scaleway_volume_attachment.store.0: Still creating... (1m30s elapsed)
scaleway_volume_attachment.store.2: Still creating... (1m40s elapsed)
scaleway_volume_attachment.store.1: Still creating... (1m40s elapsed)
scaleway_volume_attachment.store.0: Still creating... (1m40s elapsed)
scaleway_volume_attachment.store.0: Still creating... (1m50s elapsed)
scaleway_volume_attachment.store.2: Still creating... (1m50s elapsed)
scaleway_volume_attachment.store.1: Still creating... (1m50s elapsed)
scaleway_volume_attachment.store.0: Creation complete
scaleway_volume_attachment.store.1: Still creating... (2m0s elapsed)
scaleway_volume_attachment.store.2: Still creating... (2m0s elapsed)
scaleway_volume_attachment.store.1: Still creating... (2m10s elapsed)
scaleway_volume_attachment.store.2: Still creating... (2m10s elapsed)
scaleway_volume_attachment.store.2: Still creating... (2m20s elapsed)
scaleway_volume_attachment.store.1: Still creating... (2m20s elapsed)
scaleway_volume_attachment.store.1: Still creating... (2m30s elapsed)
scaleway_volume_attachment.store.2: Still creating... (2m30s elapsed)
scaleway_volume_attachment.store.1: Still creating... (2m40s elapsed)
scaleway_volume_attachment.store.2: Still creating... (2m40s elapsed)
scaleway_volume_attachment.store.1: Still creating... (2m50s elapsed)
scaleway_volume_attachment.store.2: Still creating... (2m50s elapsed)
scaleway_volume_attachment.store.1: Still creating... (3m0s elapsed)
scaleway_volume_attachment.store.2: Still creating... (3m0s elapsed)
scaleway_volume_attachment.store.2: Still creating... (3m10s elapsed)
scaleway_volume_attachment.store.1: Still creating... (3m10s elapsed)
scaleway_volume_attachment.store.1: Still creating... (3m20s elapsed)
scaleway_volume_attachment.store.2: Still creating... (3m20s elapsed)
scaleway_volume_attachment.store.1: Still creating... (3m30s elapsed)
scaleway_volume_attachment.store.2: Still creating... (3m30s elapsed)
scaleway_volume_attachment.store.2: Still creating... (3m40s elapsed)
scaleway_volume_attachment.store.1: Still creating... (3m40s elapsed)
scaleway_volume_attachment.store.1: Still creating... (3m50s elapsed)
scaleway_volume_attachment.store.2: Still creating... (3m50s elapsed)
scaleway_volume_attachment.store.2: Creation complete
scaleway_volume_attachment.store.1: Still creating... (4m0s elapsed)
scaleway_volume_attachment.store.1: Still creating... (4m10s elapsed)
scaleway_volume_attachment.store.1: Still creating... (4m20s elapsed)
scaleway_volume_attachment.store.1: Still creating... (4m30s elapsed)
scaleway_volume_attachment.store.1: Still creating... (4m40s elapsed)
scaleway_volume_attachment.store.1: Still creating... (4m50s elapsed)
scaleway_volume_attachment.store.1: Still creating... (5m0s elapsed)
scaleway_volume_attachment.store.1: Still creating... (5m10s elapsed)
scaleway_volume_attachment.store.1: Still creating... (5m20s elapsed)
scaleway_volume_attachment.store.1: Still creating... (5m30s elapsed)
scaleway_volume_attachment.store.1: Still creating... (5m40s elapsed)
scaleway_volume_attachment.store.1: Still creating... (5m50s elapsed)
scaleway_volume_attachment.store.1: Still creating... (6m0s elapsed)
scaleway_volume_attachment.store.1: Still creating... (6m10s elapsed)
scaleway_volume_attachment.store.1: Creation complete

Apply complete! Resources: 7 added, 0 changed, 0 destroyed.

The state of your infrastructure has been saved to the path
below. This state is required to modify and destroy your
infrastructure, so keep it safe. To inspect the complete state
use the `terraform show` command.

State path: terraform.tfstate

real    7m39.445s
user    0m0.604s
sys 0m0.606s

@tboerger
Copy link
Contributor

And it took 11 minutes to destroy :D

@nicolai86
Copy link
Contributor Author

@tboerger at the moment all scaleway_volume_attachments are executed in serial. This makes the behaviour slow for multiple volumes, but it's correct.

A future optimization would be to add a barrier on stop & start actions, then we could batch the attachments which would speed things up considerably.

But we should leave this off to future PRs :)

@tboerger
Copy link
Contributor

True, at least it should work now properly.

@dcharbonnier
Copy link

what are we waiting to merge this ?

@stack72
Copy link
Contributor

stack72 commented Oct 27, 2016

Hi @nicolai86

Sorry for the time taken to get back to you on this one - can you rebase it for me and I can get it merged?

You may have to change the timeout in the retry introduced by the other PR i merged today by you due to the 11 minutes above. Maybe make it a 20minute retry?

Paul

since scaleway requires servers to be powered off to attach volumes to, we need
to make sure that we don't power down a server twice, or power up a server while
it's supposed to be modified.

sadly terraform doesn't seem to sport serialization primitives for usecases like
this, but putting the code in question behind a `sync.Mutex` does the trick, too

fixes #9417
@nicolai86
Copy link
Contributor Author

I've rebased the PR and also increased the timeouts 👍

thanks for looking after this :)

@stack72
Copy link
Contributor

stack72 commented Oct 27, 2016

This LGTM now @nicolai86 :) Tests are green (as always!) - will squash and merge - thanks for the awesome work as usual

% make testacc TEST=./builtin/providers/scaleway                                                                                           1 ↵
==> Checking that code complies with gofmt requirements...
go generate $(go list ./... | grep -v /terraform/vendor/)
2016/10/27 16:38:12 Generated command/internal_plugin_list.go
TF_ACC=1 go test ./builtin/providers/scaleway -v  -timeout 120m
=== RUN   TestAccScalewayDataSourceBootscript_Basic
--- PASS: TestAccScalewayDataSourceBootscript_Basic (2.99s)
=== RUN   TestAccScalewayDataSourceBootscript_Filtered
--- PASS: TestAccScalewayDataSourceBootscript_Filtered (2.42s)
=== RUN   TestAccScalewayDataSourceImage_Basic
--- PASS: TestAccScalewayDataSourceImage_Basic (14.49s)
=== RUN   TestAccScalewayDataSourceImage_Filtered
--- PASS: TestAccScalewayDataSourceImage_Filtered (11.31s)
=== RUN   TestAccScalewayIP_importBasic
--- PASS: TestAccScalewayIP_importBasic (2.92s)
=== RUN   TestAccScalewaySecurityGroup_importBasic
--- PASS: TestAccScalewaySecurityGroup_importBasic (3.95s)
=== RUN   TestAccScalewayServer_importBasic
--- PASS: TestAccScalewayServer_importBasic (113.34s)
=== RUN   TestAccScalewayVolume_importBasic
--- PASS: TestAccScalewayVolume_importBasic (5.91s)
=== RUN   TestProvider
--- PASS: TestProvider (0.00s)
=== RUN   TestProvider_impl
--- PASS: TestProvider_impl (0.00s)
=== RUN   TestAccScalewayIP_Basic
--- PASS: TestAccScalewayIP_Basic (15.24s)
=== RUN   TestAccScalewaySecurityGroupRule_Basic
--- PASS: TestAccScalewaySecurityGroupRule_Basic (7.97s)
=== RUN   TestAccScalewaySecurityGroup_Basic
--- PASS: TestAccScalewaySecurityGroup_Basic (3.83s)
=== RUN   TestAccScalewayServer_Basic
--- PASS: TestAccScalewayServer_Basic (126.03s)
=== RUN   TestAccScalewayServer_SecurityGroup
--- PASS: TestAccScalewayServer_SecurityGroup (113.99s)
=== RUN   TestAccScalewayVolumeAttachment_Basic
--- PASS: TestAccScalewayVolumeAttachment_Basic (343.90s)
=== RUN   TestAccScalewayVolume_Basic
--- PASS: TestAccScalewayVolume_Basic (6.41s)
PASS
ok      github.com/hashicorp/terraform/builtin/providers/scaleway   774.719s

@stack72 stack72 merged commit d9a2e0d into hashicorp:master Oct 27, 2016
@nicolai86
Copy link
Contributor Author

<3 @stack72

antonbabenko pushed a commit to antonbabenko/terraform that referenced this pull request Oct 27, 2016
…hicorp#9493)

* provider/scaleway: fix scaleway_volume_attachment with count > 1

since scaleway requires servers to be powered off to attach volumes to, we need
to make sure that we don't power down a server twice, or power up a server while
it's supposed to be modified.

sadly terraform doesn't seem to sport serialization primitives for usecases like
this, but putting the code in question behind a `sync.Mutex` does the trick, too

fixes hashicorp#9417

* provider/scaleway: use mutexkv to lock per-resource

following  @dcharbonnier  suggestion. thanks!

* provider/scaleway: cleanup waitForServerState signature

* provider/scaleway: store serverID in var

* provider/scaleway: correct imports

* provider/scaleway: increase timeouts
mathieuherbert pushed a commit to mathieuherbert/terraform that referenced this pull request Oct 30, 2016
…hicorp#9493)

* provider/scaleway: fix scaleway_volume_attachment with count > 1

since scaleway requires servers to be powered off to attach volumes to, we need
to make sure that we don't power down a server twice, or power up a server while
it's supposed to be modified.

sadly terraform doesn't seem to sport serialization primitives for usecases like
this, but putting the code in question behind a `sync.Mutex` does the trick, too

fixes hashicorp#9417

* provider/scaleway: use mutexkv to lock per-resource

following  @dcharbonnier  suggestion. thanks!

* provider/scaleway: cleanup waitForServerState signature

* provider/scaleway: store serverID in var

* provider/scaleway: correct imports

* provider/scaleway: increase timeouts
gusmat pushed a commit to gusmat/terraform that referenced this pull request Dec 6, 2016
…hicorp#9493)

* provider/scaleway: fix scaleway_volume_attachment with count > 1

since scaleway requires servers to be powered off to attach volumes to, we need
to make sure that we don't power down a server twice, or power up a server while
it's supposed to be modified.

sadly terraform doesn't seem to sport serialization primitives for usecases like
this, but putting the code in question behind a `sync.Mutex` does the trick, too

fixes hashicorp#9417

* provider/scaleway: use mutexkv to lock per-resource

following  @dcharbonnier  suggestion. thanks!

* provider/scaleway: cleanup waitForServerState signature

* provider/scaleway: store serverID in var

* provider/scaleway: correct imports

* provider/scaleway: increase timeouts
@ghost
Copy link

ghost commented Apr 21, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scaleway: Adding multiple volumes doesn't work
5 participants