Improve `PreloadEmbeddedSchema` performance #1369

dbanck · 2023-08-30T11:41:37Z

Background

In #1071, we have changed the way we store and load the bundled schemas to reduce our memory footprint. We released the work with version 0.29.3. User reports indicate that in previous versions, e.g. 0.29.2, the observed performance was better.

The Problem

TL:DR we read a schema file from disk every time, even if we have already loaded the file

We schedule the PreloadEmbeddedSchema job when decoding a module (textDocument/didOpen, textDocument/didChange) and from the indexer when walking the workspace files.

The job reads the required providers from a module and checks which schemas are still missing in our internal memory. We use the list of missing schemas to load schemas from disk. After loading a schema from disk, we store it with the address (e.g. registry.terraform.io/hashicorp/aws) and the specific version (found in the schema path) (e.g. 4.33.0) in memory. The next time we come across the same provider in another module, we would reuse the stored schema and not load it again. At least in theory.

Provider requirements usually contain a source and a version (it is a good practice to always include a version). When checking for missing schemas, we were using the unique combination of both to check if a schema already exists in memory. The schemas we bundle with the language server are only the most recent versions of all official and partner providers, so a mismatch between those versions is likely.

A version mismatch triggers a read of the schema from disk (if the schema is part of our bundle) and we try to store it in memory (again). Storing fails with schema for %s is already loaded, because we already store the specific combination of address and version of the schema file from disk.

The Fix

When checking for missing schemas, we now only compare the address and use a version constraint that matches any version (> 0.0.0). We also account for legacy sources and the builtin Terraform provider.

Results

0.29.2 CPU & Memory after launch

>= 0.29.3 CPU & Memory after launch

This PR CPU & Memory after launch

As a result of this fix, we're back to 0.29.2 levels of CPU usage while maintaining the memory improvements of 0.29.3. This also reduces the work & IO we do on each change event.

github-actions · 2023-09-30T03:02:12Z

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

Only compare the address for missing schemas on preload

fe24feb

dbanck requested a review from a team as a code owner August 30, 2023 11:41

Improve MissingSchemas return value

8d78d4e

dbanck self-assigned this Aug 30, 2023

dbanck added bug Something isn't working performance Gotta go fast labels Aug 30, 2023

Replace fake constraint

382f009

radeksimko approved these changes Aug 30, 2023

View reviewed changes

dbanck merged commit d730e72 into main Aug 30, 2023
20 checks passed

dbanck deleted the b-fix-preload-schema-performance branch August 30, 2023 15:50

This was referenced Aug 30, 2023

Performance Improvement Plan hashicorp/vscode-terraform#1557

Open

Performance Improvement Plan #1358

Open

dbanck mentioned this pull request Sep 4, 2023

Super-slow, character-by-character diagnostics since 0.29.3 #1317

Closed

1 task

github-actions bot locked as resolved and limited conversation to collaborators Sep 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `PreloadEmbeddedSchema` performance #1369

Improve `PreloadEmbeddedSchema` performance #1369

dbanck commented Aug 30, 2023 •

edited

Loading

github-actions bot commented Sep 30, 2023

Improve PreloadEmbeddedSchema performance #1369

Improve PreloadEmbeddedSchema performance #1369

Conversation

dbanck commented Aug 30, 2023 • edited Loading

Background

The Problem

The Fix

Results

github-actions bot commented Sep 30, 2023

Improve `PreloadEmbeddedSchema` performance #1369

Improve `PreloadEmbeddedSchema` performance #1369

dbanck commented Aug 30, 2023 •

edited

Loading