Improve PreloadEmbeddedSchema
performance
#1369
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
In #1071, we have changed the way we store and load the bundled schemas to reduce our memory footprint. We released the work with version
0.29.3
. User reports indicate that in previous versions, e.g.0.29.2
, the observed performance was better.The Problem
TL:DR we read a schema file from disk every time, even if we have already loaded the file
We schedule the
PreloadEmbeddedSchema
job when decoding a module (textDocument/didOpen
,textDocument/didChange
) and from the indexer when walking the workspace files.The job reads the required providers from a module and checks which schemas are still missing in our internal memory. We use the list of missing schemas to load schemas from disk. After loading a schema from disk, we store it with the address (e.g.
registry.terraform.io/hashicorp/aws
) and the specific version (found in the schema path) (e.g.4.33.0
) in memory. The next time we come across the same provider in another module, we would reuse the stored schema and not load it again. At least in theory.Provider requirements usually contain a source and a version (it is a good practice to always include a version). When checking for missing schemas, we were using the unique combination of both to check if a schema already exists in memory. The schemas we bundle with the language server are only the most recent versions of all official and partner providers, so a mismatch between those versions is likely.
A version mismatch triggers a read of the schema from disk (if the schema is part of our bundle) and we try to store it in memory (again). Storing fails with
schema for %s is already loaded
, because we already store the specific combination of address and version of the schema file from disk.The Fix
When checking for missing schemas, we now only compare the address and use a version constraint that matches any version (
> 0.0.0
). We also account for legacy sources and the builtin Terraform provider.Results
0.29.2 CPU & Memory after launch
>= 0.29.3 CPU & Memory after launch
This PR CPU & Memory after launch
As a result of this fix, we're back to
0.29.2
levels of CPU usage while maintaining the memory improvements of0.29.3
. This also reduces the work & IO we do on each change event.