-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add Azure provider (peer-review PR) #2053
Conversation
d.Set("media_link", datadisk.MediaLink) | ||
|
||
log.Printf("[DEBUG] Retrieving disk: %s", d.Id()) | ||
disk, err := virtualmachinedisk.NewClient(mc).GetDisk(d.Id()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a nit but why not create the client once and reuse it? Does not make sense to create a new client for each call, unless something mutates in the client structure after each call.
Something like:
mc := meta.(management.Client)
diskClient, err := virtualmachinedisk.NewClient(mc)
if err != nil {
return fmt.Errorf("...")
}
disk, err := diskClient.GetDisk(d.Id())
dataDisk, err := diskClient.GetDataDisk(vm, vm, vm, lun)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would indeed be cleaner... I'm currently adding acceptance tests and will update these (where applicable across all resources) as well.
Will continue review later. So far just a few comments about 1/3 of the way in. |
d.Set("location", cs.Location) | ||
|
||
log.Printf("[DEBUG] Retrieving instance: %s", d.Id()) | ||
dpmt, err := virtualmachine.NewClient(mc).GetDeployment(d.Id(), d.Id()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate hosted service name here.
Only the azure_instance is fully working (for both Linux and Windows instances) now, but needs some tests. network and disk and pretty much empty, but the idea is clear so will not take too much time…
Starting to look pretty nice…
|
||
name := d.Get("name").(string) | ||
|
||
nc, err := virtualnetwork.NewClient(mc).GetVirtualNetworkConfiguration() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must insist on the fact that networking operations need to be synced somehow.
Long story short of what's being presented here, because of how the virtualnetwork API of Azure works (i.e. you ask for all the networking configuration associated to your account, do the necessary additions/modifications/removals on it, and send it all back to Azure where it effectively gets 'pasted over' the current resources in Azure), it is extremely hazardous to do such operations currently.
With your current implementation; what would happen is that, whenever two or more virtual networks (the only resource currently affected by this problem here) are defined and apply is ran; they'd go ahead, each would fetch its own network configuration and 'apply itself' unto it; after which both would send the config back (both networks not having any idea about each-other's creation), and the one that sends the config last practically overrides the first comer's creation.
If you don't believe me; please attempt creating two or more virtual networks in one go. You'll see that only one of them actually gets created; the others fading silently away from existence because you call Read at the end of Create and Terraform assumes the resource has been deleted in the meantime...
By far the most feasible solution here is simply using a mutex whenever resources affected by this limitation of the API (virtual networks here; local network connections and DNS definitions on my side as well) are Created/Read/Deleted.
Sure, it sucks big time; as the updating of such resources would have to be sequentially. One must bear in mind, however, that this is wholly a limitation of the Azure API and there's really nothing us as Terraform developers can do that hit the +1 button here. 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully understand how the API works with regards to virtual networks, and indeed the API for virtual networks cannot be used concurrently in a safe way.
So yes, you are completely right that these network operation need to be synced somehow. But let me explain why your suggestion to fix it in the Azure provider itself isn't going to work...
In your implementation you added a mutex to guard concurrent use of a management.Client
within the plugin, to prevent these kinds of problems with API endpoints that do not support being used concurrently. But there is no concurrency within the plugin itself. The concurrency is build into the core of Terraform and it will create a new instance of the plugin (most likely in it's own goroutine) for every call that can be made concurrently.
So in this case if you create a Terraform config with two virtual networks in there, Terraform core will actually call the plugin twice, resulting in two separate instances of the plugin that will then both have their own mutex.
Considering all this, you'll now probably understand that having a mutex here (or any other means to synchronise access to the management.Client
within the plugin itself) will not solve your problem as it will only guard an already synchronous instance of the plugin from using the management.Client
concurrently.
So my line of thought here was that we currently have only one, or maybe two (both not feasible), options to fix this:
- Fix the Azure API (the only real solution if you ask me);
- Change the internals of Terraform;
Unfortunately these are both changes that most likely will not be made anytime (if ever) soon. Next to that our influence of getting it changed is very limited at the least. So that leaves us with only one additional option which would be to not offer a resource to manage virtual networks. But that in turn would be taking away a significant portion of the functionality and with that the usability of the Azure provider...
So I decided (for myself) that for the time being the best approach would be to just build the provider/resources as if it was safe to make concurrent API calls and then document very clearly in the docs that for the azure_virtual_network
resource there currently is a known limitation/issue with the API.
Luckily we can offer a (pretty simple) workaround in those same docs, so people can safely use Terraform configs that contain multiple azure_virtual_network
resources. The workaround is to add an explicit dependency between all azure_virtual_network
resources in your config by defining the depends_on
argument for these resources. That way Terraform will make all calls to create virtual network synchronously, one after the other...
Is it perfect? Nope! Is it an acceptable way forward at this very moment in time? Yes I believe so...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do admit; I did not take as thorough a look through the internals of Terraform as I would have liked.
The exact way of handling provider plugins you're describing was my exact fear when I first hit this problem.
Weirdly enough though; I can personally vouch that using a mutex there does take care of the problem by itself (i.e. I did not introduce any explicit config dependencies in my attempts; just defined two independent virtual networks and let Terraform do its thing...).
I feel it would be best to ask someone more knowledgeable in Terraform's internals before we continue this debate further.
Will come back to you on this as soon as I get my clarifications...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented a (still closed-source provider since not had permission to open source it yet...) terraform provider for Azure that ran into this exact issue. The only way to get around it was to serialise the access to the API as you say. However, I split the virtual network resource into network and subnet resources so the problem was more acute - the graph resolution would be such that with no explicit dependency management that both subnets could be created at once, and would do so in different provider processes - in the end a mutex file was required and one or the other had to block even though terraform thought it was constructing them in parallel. Sadly this still doesn't stop concurrency issues from other management clients elsewhere. The API is just broken and it seems that Microsoft don't care enough to fix it.
It's also worth discussing with the azure-sdk-for-go guys their plans to switch over to an auto-generated API based on the Resource Manager API instead of the Service Management API - it might end up rendering a big chunk of this obsolete in reasonably short order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well sir, I stand corrected!
After repeating your tests with a customised branch of the one in this PR, I could indeed clearly see that your absolutely right!
Looking a little deeper into the code it looks like TF is reusing existing connections to already connected plugins by multiplexing over those existing connections (see these files for more details).
Luckily it's just a small change to add the mutex so I updated the code accordingly just before I finished writing the docs. So the Virtual Network related API calls will now by synchronised 😃
As in this case, it often seems that direct communication (like over IRC in this case) works a lot better to discuss stuff like this and/or to come to a resolution. So please know that you can almost always find me on IRC if anything is up.
Thanks for pointing this out! Will push in the updated code and docs in a few minutes as soon as a last round of acceptance tests are finished (ETA 15 min, they take about 40 minutes)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to have helped.
When I tested it; I wasn't so thorough as looking through the code as I was of just running it and seeing what happens, so I got to learn some more about Terraform's internals today, which is nice.
Good job on everything and god_speed_ merging!
Can't wait to see your work get in and start expanding it even further from there on 😄
Next to the remaining docs, I also updated the code so any Virtual Network related API calls are now synchronised by using a mutex (thanks @aznashwan for pointing that out!).
SettingsFile: d.Get("settings_file").(string), | ||
SubscriptionID: d.Get("subscription_id").(string), | ||
Certificate: []byte(d.Get("certificate").(string)), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do a homedir expand on the settings_file
? Like we do over here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, will commit in a second...
Okay nearly completed my review pass. I've kicked off the AccTests locally and I have to run to a meeting. Will complete review afterwards and we can get this in. 👌 |
I'll be behind a desk again later today, so will have a look in a few hours... |
All AccTests passed locally. This is looking good to me. My nits can be cleaned up post merge if you want - @svanharmelen - but will let you decide what you want to do. From my side: merge at will! |
If a tilde is used in the path, this makes sure we expand it to a full path.
Fixed/updated the retrieval of the |
WIP: Add Azure provider (peer-review PR)
🎉 🎊 |
} | ||
} | ||
|
||
func testAccCheckAzureVirtualNetworkDestroy(s *terraform.State) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this function just check if the resource still exists instead of actually deleting it itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that's correct. Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed a good catch! I think I might have used this (wrong) approach in the CloudStack provider as well. Will walk through both providers and update the tests where applicable...
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
No description provided.