Make provider configuration available prior to GetSchema call #281

UnquietCode · 2019-12-14T22:31:52Z

SDK version

v1.4.0

Use-cases

When working with large or more complicated providers, it would be very helpful to have the provider configuration attributes available during the schema gathering phase. Being able to make some slight adjustments to the schema based on the config opens up some possibilities that I am currently implementing through other means. For example, enabling premium vs. regular API's, or turning off resources that aren't desired to improve loading time. (The loading time in particular is a struggle because a plugin is invoked repeatedly during a normal run.)

Attempted Solutions

An available workaround is to use environment variables to provide configuration options to the provider, which involves duplicating the options in the terraform configuration and also means validation of the options has to be done separately.

Proposal

While it would be nice to have the provider configured in full prior to the initial calls, this would have to be optional since many providers expect to do slower initialization work during that process. Instead it would be great to just have the configuration attributes as specified in the terraform configuration made available ahead of time, so that they can be actionable in both the GetSchema and provider configuration phases.

References

I wasn't able to find an existing request of this nature.

The text was updated successfully, but these errors were encountered:

radeksimko · 2020-01-02T16:11:05Z

Hi @UnquietCode
I will try to respond inline below.

Being able to make some slight adjustments to the schema based on the config opens up some possibilities that I am currently implementing through other means.

Generally speaking predictability and the ability to validate configuration early on is an important value of Terraform. Giving plugin the ability to dynamically modify parts of schema on-the-fly is likely to go against that goal and for that reason I imagine we'd be hesitant to make any changes in that area.

I may be misinterpreting your suggestions here though and/or what you are trying to achieve may be achievable without having to modify the provider schema.

For example, enabling premium vs. regular API's, or turning off resources that aren't desired to improve loading time.

Do you mind sharing some details about the API design here? e.g. what makes API premium? Is it entirely different endpoint (hostname)? Are users likely to use both regular and premium APIs at the same time (in the same configuration), or would they more likely use one of these? Is there any functional overlap between these APIs?

The loading time in particular is a struggle because a plugin is invoked repeatedly during a normal run.

This sounds like an important argument. Do you mind sharing more details about this? e.g. how many resources are you dealing with and how many would you expect to get "disabled" under what circumstances?

Also could you share any data in regards to performance, if you have measured it in any way?

I did notice that our biggest official provider - AWS - tends to have a small delay before actually beginning to calculate the plan, apply, or most other plugin-dependent operations, which may have a couple of reasons, number of resources (539) and data sources (154) being one.

Our stance in the past was that most users of that particular plugin are more likely to wait for remote APIs and that waiting is experienced many times repeatedly during any refresh/plan or apply operation. User is however unlikely to run many such operations in quick succession, because each operation just can't finish quickly anyway (due to waiting for the API).

So when we put the delay into context with other delays user experiences, we concluded that optimizing this was not a priority. That doesn't mean we can't or shouldn't optimize it, just that it doesn't seem the effort would match the value majority of users would gain from such optimization.

Having more data and knowing more details about your use case may help us better understand it and perhaps re-evaluate the importance of mentioned optimizations or other changes in that area.

UnquietCode · 2020-01-07T11:43:05Z

Hey, thank you for the detailed response! I know that takes time and focus on your part.

Your point about consistency and predictability, I had expected some push back here, and even find myself agreeing with that as a goal. I can talk a little more about the API layout, but let me just answer these quickly first:

Is it entirely different endpoint (hostname)? -- Yes
Are users likely to use both regular and premium APIs at the same time (in the same configuration) -- Yes
Is there any functional overlap between these APIs? -- Yes

Mainly there are services we offer at a premium that a normal user would have no use for, so the thought was not even loading them could be a better experience for the average user. It actually isn't hard to just imagine these as two separate plugins with a common codebase, however there doesn't seem like a great way to share provider configurations right now (correct me if I'm wrong), and we were hoping to try and keep the configuration burden rather low for new users.

The performance we've clocked is about 10 seconds for each plugin invocation, of which there are a few. Compared to the AWS provider, which actually does a few network calls each time, it's about 10x slower. However we also have about 10x more resources and datasources combined, about 10,000 total, so there is much schema to chew through. I think I saw most of the time being spent in messagepack serialization, so maybe some kind of caching or pre-rendering of the schema messages could help us as well.

When I think solely about the loading time problem, some other possibilities come to mind, some of which have their own feature requests already:

schema reuse -- tell terraform what can be reused (Allow provider to define uniqueness of a resource #224)
serialization caching -- if the schema doesn't change, can we cash the serialization of it somewhere outside of the process?
lazy load the resources and datasources map, an then ask the providers only for those resources and datasources it cares about (seems like the protobuff *Request objects kind of do this?)
try to call plugins in a way that lets them initialize on the fly as protocol methods are invoked
- is fthe ull schema needed every time?
- can already created configs be passed between processes?
better support for provider configuration sharing (variable interpolation, for example)
"plugin factory" which can load smaller units of functionality from a larger provider

As I mentioned, the premium API problem is related but is more addressable. We would prefer to not have to break up our other provider into smaller units. I will give a hint towards why our providers are so big, and that is because they are built with code generators. Obviously this is a smaller use case, but increasingly there are some great Terraform plugin projects which are doing just that.

So then, if there is a path towards small configuration or optimization changes that would start to open up some breathing room for large providers, maybe we could contribute that on our own, but with a little steering towards what would be an acceptable solution and performance goal.

mingfang · 2020-01-07T12:19:43Z

+1
I built a plugin for Kubernetes that dynamically builds Schema using the Kubernetes Open API.
https://github.com/mingfang/terraform-provider-k8s

My plugin needs this feature, the ability to get the configuration, so that it can generate the Schema. Currently I depend on the default kubeconfig.

radeksimko · 2020-01-09T11:22:28Z

Thank you for describing the API more closely - this is useful! It sounds like you do have a valid reason to keep the logic within one provider based on these details.

It actually isn't hard to just imagine these as two separate plugins with a common codebase, however there doesn't seem like a great way to share provider configurations right now (correct me if I'm wrong)

It is not something providers often do today (at least I'm not aware of many examples in the wild), but it's possible. I wouldn't call it an anti-pattern nor bad idea - at least not today with the information I have available. The only downside to be aware of is that interpolation doesn't work between the provider blocks themselves at this point, unless you are just interpolating data sources and variables. See hashicorp/terraform#4149 which describes this problem in more detail. It may not be a blocker though if each provider connects to the API independently and they only interpolate on resource level.

However we also have about 10x more resources and datasources combined, about 10,000 total, so there is much schema to chew through.

For better or worse your provider seems to be on the other end (more extreme) of our diverse user base and I'm not too surprised that you're experiencing these delays given the number of resources and data sources you have there. I don't know about any provider that would be anywhere near these figures.

schema reuse -- tell terraform what can be reused (#224)

I would say #224 is aiming to solve a more logical problem where user mistakenly defines a duplicate resource and Terraform attempts to manage it in two or more places. I don't think it aims to solve any performance related problem.

serialization caching -- if the schema doesn't change, can we cash the serialization of it somewhere outside of the process?

Potentially yes - I suppose there is some space for improvement, but from my experience caching usually doesn't come "for free", so I'd be personally tempted to try everything else before building a cache mechanism.

lazy load the resources and datasources map, an then ask the providers only for those resources and datasources it cares about (seems like the protobuff *Request objects kind of do this?)
try to call plugins in a way that lets them initialize on the fly as protocol methods are invoked
is fthe ull schema needed every time?

This seems the most appealing way to tackle this for me 👍 It will require some changes in the protocol and that most likely won't happen until we have deprecated the old protocol 4 (Terraform 0.11). Feel free to file a separate issue for that though - we can discuss this path in more detail there.

can already created configs be passed between processes?

Not sure I follow. What do you mean by "already created configs"? Only relevant parts of parsed config are sent to the relevant plugin and only made available to the relevant resource:

terraform-plugin-sdk/internal/tfplugin5/tfplugin5.proto

Lines 188 to 196 in 22ad314

    
           message ValidateResourceTypeConfig { 
        
               message Request { 
        
                   string type_name = 1; 
        
                   DynamicValue config = 2; 
        
               } 
        
               message Response { 
        
                   repeated Diagnostic diagnostics = 1; 
        
               } 
        
           }

better support for provider configuration sharing (variable interpolation, for example)

Unless I'm mistaken it sounds like you're proposing what I proposed back in 2015 here hashicorp/terraform#1199

For the last two point though I have doubts that any optimization related to interpolation is going to make a significant performance difference in this context, because the whole config will eventually be interpolated and individually handed over to plugins for consumption. This is mostly by design as it allows us to parallelise and decouple operations and decoupling the memory makes it easier to reason about these abstractions.

"plugin factory" which can load smaller units of functionality from a larger provider

Sharing functionality is possible, but providers were never designed to be importable and their Go code shouldn't represent public API because that is likely to cause headaches as providers need to comply with semantic versioning from end-user's perspective. For that reason we recommend decoupling shared logic into separate repos (Go modules) outside of the provider codebase.

We do have some plans to allow sharing of resources within a provider, but the motivation there wasn't originally performance but improvement of the user experience: #57 Perhaps we can discuss expanding that outside of a single provider.

I will give a hint towards why our providers are so big, and that is because they are built with code generators. Obviously this is a smaller use case, but increasingly there are some great Terraform plugin projects which are doing just that.

Generated providers is certainly something we do want to support in the future. I have built a custom generator of provider code myself which was used to generate the initial version of the K8S provider. Google also developed their solution. In that context I understand some reasons and challenges. I think though this is a parallel discussion as generating provider in itself doesn't necessarily imply its size, although it makes it easier to create and maintain bigger providers. It seems we do not have a GitHub issue for this yet - feel free to create one, I'd be happy to contribute with some more thoughts.

10k resources really does sound like a lot. It's hard to judge if it's the right design though without knowing the purpose of the API and what these individual resources represent though.

All I can suggest is think about resources from user's perspective and avoid just blindly translating each API endpoint to a resource or a data source. Terraform is meant to help users reducing the complexity and not just act like a proxy between an API and a human.

For example AWS EC2 has plenty of capabilities, but we don't map each to a resource. In fact the aws_instance resource alone talks to many endpoints behind the scenes: https://github.com/terraform-providers/terraform-provider-aws/blob/master/aws/resource_aws_instance.go
This does present some other challenges (maintenance), but it presents better UX (user is saved from having to put many pieces of puzzle together) and performance (we don't need as many resources).

UnquietCode added the enhancement New feature or request label Dec 14, 2019

radeksimko added the upstream-protocol Requires change of protocol specification, i.e. can't be done under the current protocol label Jan 2, 2020

paultyng added the waiting-response Issues or pull requests waiting for an external response label Jan 6, 2020

ghost removed the waiting-response Issues or pull requests waiting for an external response label Jan 7, 2020

mingfang mentioned this issue Jan 7, 2020

Allow configuring cluster connection details in provider block mingfang/terraform-provider-k8s#8

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make provider configuration available prior to GetSchema call #281

Make provider configuration available prior to GetSchema call #281

UnquietCode commented Dec 14, 2019

radeksimko commented Jan 2, 2020

UnquietCode commented Jan 7, 2020 •

edited

Loading

mingfang commented Jan 7, 2020

radeksimko commented Jan 9, 2020

Make provider configuration available prior to GetSchema call #281

Make provider configuration available prior to GetSchema call #281

Comments

UnquietCode commented Dec 14, 2019

SDK version

Use-cases

Attempted Solutions

Proposal

References

radeksimko commented Jan 2, 2020

UnquietCode commented Jan 7, 2020 • edited Loading

mingfang commented Jan 7, 2020

radeksimko commented Jan 9, 2020

UnquietCode commented Jan 7, 2020 •

edited

Loading