Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add caching to various schema reading functions #15641

Closed
wants to merge 1 commit into from

Conversation

dshafik
Copy link

@dshafik dshafik commented Jul 27, 2017

This is a massive performance boost for larger, more complex (lots of nested Set) schemas. For a complex config it brings the time from >3hrs down to "instant".

This may not be the best solution to this problem, but it is a solution, and dramatically reduces the recursion that occurs. I suspect there is further caching that can be done within the for loop in addrToSchema() but I couldn't figure that out.


Reproduction

I have created a "simple" provider to replicate the issue here: https://github.com/dshafik/terraform/tree/nested-schema-example

The provider (excepting the schema is literally one simple function:

func nestedRead(d *schema.ResourceData, meta interface{}) error {
	start := time.Now().UnixNano()
	s := d.Get("rule").(*schema.Set)
	dump := spew.Sdump(s)

	end := time.Now().UnixNano()

	delta := end - start

	return fmt.Errorf("Time to Get: %d\n\n%s\n\n", delta, dump)
}

As you can see it is effectively just a schema.ResourceData.Get() call. You may need to go get github.com/davecgh/go-spew/spew for the dump to work, but you can obviously just remove spew call if you don't care about the output.

With the example config it takes ~1 minute; and you can drastically reduce that time by commenting out the option blocks.


Real

The actual schema is much bigger, but the schema definition doesn't seem to make a difference, it's entirely down to the config itself.

A full config (3hrs+ without this fix) can be seen here

@sean-
Copy link
Contributor

sean- commented Jul 27, 2017

@dshafik This approach doesn't seem right. As you say,

This may not be the best solution to this problem, but it is a solution,

Could you collect information to figure out why this is taking 3hrs in your case? I don't doubt this made it faster, but where is the 3hrs going? If the 3hrs is being spent recursing and reevaluating, that's a problem that should get fixed (possibly through memoization). If the 3hrs is spent making repeated network-based API calls, then most certainly a cache would be the right solution. Which is to say, I don't think we understand the problem yet. Can you use something like go-torch to attempt to answer that question?

@dshafik
Copy link
Author

dshafik commented Jul 27, 2017

@sean- there is nothing in the nested provider except an attempt to retrieve a top-level schema element with a complex schema/config. (a ResourceData.Get() call). This means zero-network code.

The issue is recursion trying to traverse and construct the data tree.

I've attached Flame graphs for the 1min and >3hr (killed after ~10 minutes) configs, alongside the pprof data, and config itself.

I've also updated dshafik/terraform/nested-schema-example to include this profiling.

flame-graphs.zip

@jbardin
Copy link
Member

jbardin commented Jul 27, 2017

Hi @dshafik,

Thanks for the profile data too! I don't think the flame graph is quite the right visualization for this, but I think it will be useful to find out why the number of calls to those functions is so high. The standard profiler isn't always good at this sort of thing since it's is a sampling profiler, but there are other methods available! (just need time to dust them off ;))

While this type of optimization is not unprecedented, the code paths being optimized here are obviously very fragile. While we intend to improve the config+schema situation, that's not a near-term possibility.

I can take a look here and see if there's a way forward with the caching idea, or if I happen stumble across any other obvious reasons for the performance issue.

@dshafik
Copy link
Author

dshafik commented Jul 27, 2017

@jbardin this is unfortunately a blocker for the Akamai provider; I'm open to different schema suggestions in the meantime but I don't really see a viable one that gives us all the functionality we require.

@jbardin
Copy link
Member

jbardin commented Aug 4, 2017

Closed by 15663

@jbardin jbardin closed this Aug 4, 2017
@ghost
Copy link

ghost commented Apr 8, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants