Improve performance when resolving the workspace root #6530

ivyspirit · 2024-06-26T20:33:20Z

Checklist

I have filed an issue about this change and discussed potential changes with the maintainers.
I have received the approval from the maintainers to make this change.
This is not a stylistic, refactoring, or cleanup change.

Please note that the maintainers will not be reviewing this change until all checkboxes are ticked. See
the Contributions section in the README for more
details.

Discussion thread for this change

Issue number: 5719

Description of this change

We have pretty big bazel project, and we are using the rule_jvm_external pinned feature, which download all the external dependencies to the {baze-base}/external dir. The external folder could grow very big. One example the smallest cache folder for us:

% echo "Total directories:" $(find {root}/.cache/bazel/arm64/96fb5a4ccfb8aa9cafd25443b98fa7e6/external -type d | wc -l) && du -sh {root}.cache/bazel/arm64/96fb5a4ccfb8aa9cafd25443b98fa7e6/external

Total directories: 54733
5.8G	{root}/.cache/bazel/arm64/96fb5a4ccfb8aa9cafd25443b98fa7e6/external

The original logic when resolving the dependency label, it loops through the entire {baze-base}/external dir, and check if the file is dir, then create a map for with the workspaceName as key and its dir path as value. Which when the external is really big the IO operation could hang there for a long time cause the IDE freeze.

This change use the {baze-base}/external/{workspaceName} to construct the workspace root dir. And if it existed then construct the WorkspaceRoot and return.

ivyspirit · 2024-06-26T20:58:39Z

Fixing the test

ivyspirit · 2024-06-27T00:53:01Z

The ExternalWorkspaceReferenceTest test failed, bc i checkfile.exist()here and here before i return the workspaceRoot. However in the test TestFileSystem creates the PsiFile does not seem to create file in the test dir? Any suggestion? Once I removed the file exist check the tests all passes.

ivyspirit · 2024-06-27T18:28:39Z

The ExternalWorkspaceReferenceTest test failed, bc i checkfile.exist()here and here before i return the workspaceRoot. However in the test TestFileSystem creates the PsiFile does not seem to create file in the test dir? Any suggestion? Once I removed the file exist check the tests all passes.

@sgowroji do you have any suggestion on this? in the test it create a VirtualFile by TempFileSystem. Which the file does not exist. But in my code I need to check the file exist before return. I checked the code base didn't see an example in the test to handle this case. Any suggestion would be appreciated

ivyspirit · 2024-06-29T01:18:03Z

@mai93 i have fixed the tests. All of them are due to the mock set up. Please take a look. Thanks!

mai93 · 2024-07-02T20:17:37Z

LGTM from me, @tpasternak if you have time can you review this?

tpasternak · 2024-07-03T11:59:01Z

yep, I'll try it later today

tpasternak

This is indeed a really good finding. Thank you for the contribution! I just left some comments, but please don't consider it a finalized review, so you can hold on with fixing that. I'd like to try it out a little more tomorrow

base/src/com/google/idea/blaze/base/sync/workspace/WorkspaceHelper.java

tpasternak · 2024-07-03T17:27:49Z

base/src/com/google/idea/blaze/base/sync/workspace/WorkspaceHelper.java

-      return ImmutableMap.of();
+    logger.debug("getExternalWorkspaceRootsFile for " + workspaceName);
+    File externalBase = SyncCache.getInstance(project)
+        .get(workspaceName, (Project theProject, BlazeProjectData projectData) -> {


Honestly I have no idea what are the consequences of this, all other usages of SyncCache are hardcoded 😅 It's a global storage so might cause conflicts. I would prefer to keep the data structure (map?) under the ProjectHelper key and update it on demand

I used the workspaceName as key to store the workspaceRoot path. It might be a bit overkill to use it. I also think I could just keep a synchronized map instance in this class. But since the workspaceRoot map before was saved to the SyncCache so I went for it.

yeah, but there was a whole map in a single entry for a key named WorkspaceHelper.class while after your change there are N entries there.

Yeah, the prev implementation cache the entire map with the WorkspaceHelper class name. The map is huge with tens of thousands of entries, most of them is not useful. Because when you use the rule_jvm_external pinned version, all the external dependencies got downloaded directly into the external dir. The map will cache all the external dependencies dir.

With the name, it will only cache the workspaceName that you installed with the rules_jvm_external, with its path. for example android_mvn, test_mvn... It is not going to be lots of them. I can do an inspection and past the cache counts here, for our case. Considering we've already have lots of different namespaces.

Please check this out

Previously, SyncCache had a static number of entries, typically one per purpose, with hardcoded keys in the plugin's codebase. With your change, the cache becomes dynamic, and the number of entries varies.

The risk is that external workspace names are flat-structured. If another service follows your pattern, conflicts may arise. Other services might also need to store external workspace-related data and should reserve their own keys.

I'm not suggesting we revert to the old map system, but we should keep data within a map, not at the top level of sync-cache.

cc @ujohnny

Ok. I can create a map and cache it.

I changed the impl to use a local map to cache the workspace root dirs. Using the SyncCache does not make sense if we want to cache an map and consistently modifying the entries of the map. This should also fixed the issue you saw below when the init bazelProjectData is null? The workspace root will only be cached if the value if the dir existed, means the bazelProjectData is not null. Once the project is done initialization if you do the query again the value should returned as expected. @tpasternak

base/src/com/google/idea/blaze/base/sync/workspace/WorkspaceHelper.java

tpasternak · 2024-07-03T18:49:53Z

base/src/com/google/idea/blaze/base/sync/workspace/WorkspaceHelper.java

+
+    Path relativePath = bazelRootPath.relativize(path);
+    if (relativePath.getNameCount() > 0) {
+      String firstFolder = relativePath.getName(0).toString();


I'm not sure, but this might be conflicting with --experimental_sibling_repository_layout.

Apart from that it seems that bazel allows external directory name in the source root. We probably need to handle this case, too. But the old algorithm deosn't seem to support it, too, so we probably shouldn't care

Yeah the old logic assume everything is under external, if not things will break i think. But I can check that case. Would it be ok to do a followup PR?

if old code looks the same then yes, no need to fix it

tpasternak

Also I found this bug:

Import https://github.com/bazelbuild/bazel
Open the top-level BUILD file

Ok, this probably happens, because when you run sync, the blazeProjectData entry might be null so the null is written to the cache

intellij/base/src/com/google/idea/blaze/base/sync/workspace/WorkspaceHelper.java

Lines 200 to 203 in eda0807

    
           if (blazeProjectData == null) { 
        
             logger.debug("the blazeProjectData is null " + project.getName()); 
        
             return null; 
        
           }

ilisc2 · 2024-07-09T17:52:10Z

Also I found this bug:

Import https://github.com/bazelbuild/bazel

Open the top-level BUILD file

Ok, this probably happens, because when you run sync, the blazeProjectData entry might be null so the null is written to the cache

intellij/base/src/com/google/idea/blaze/base/sync/workspace/WorkspaceHelper.java

Lines 200 to 203 in eda0807

if (blazeProjectData == null) {

logger.debug("the blazeProjectData is null " + project.getName());

return null;

}

The local map cache should help on this. although before the blazeProjectData finish init the workspace root will not work. But when it query again once it is done initialization it should work. Wonder how did the old implementation work with this case? seems there would be NPEs. And once the map is cached to the SyncCache it will not be modified, so it might not recover.

tpasternak

I think this is the last issue, sorry for such a ping pong.

By the way I just noticed that the current solution, as well as the previous one doesn't work with bzlmod, where the paths do have <reponame>~<something stem, but that's another story

tpasternak · 2024-07-11T15:37:46Z

base/src/com/google/idea/blaze/base/sync/workspace/WorkspaceHelper.java

-    File[] children = provider.listFiles(getExternalSourceRoot(blazeProjectData));
-    if (children == null) {
-      return ImmutableMap.of();
+    if (!workspaceRootCache.containsKey(workspaceName)) {


This is not a primary use case for SyncCache, but I think it's still a good idea to keep it. Otherwise (what happens now) the cache is not cleared on resync. How about this way? Sorry for yet another round but i think it could lead to some bugs

@@ -201,7 +202,7 @@ if (bazelProjectData == null) { return null; } - + var workspaceRootCache = SyncCache.getInstance(project).get(WorkspaceHelper.class, (p, data) -> new ConcurrentHashMap<String, WorkspaceRoot>()); if (!workspaceRootCache.containsKey(workspaceName)) { File externalBase = new File(bazelProjectData.getBlazeInfo().getOutputBase(), "external/" + workspaceName);

Can you elaborate more on the resync case? you mean within one user session when user click on the sync? for that case we don't want to clear the cache right? The external workspace dirs are not going to change. We do want to keep them right? Just trying to understand the case of the cache lifecycle. I thought the cache should stay as long as the WorkspaceHelper instance stays. @tpasternak

I mean this data's lifetime was previously managed by SyncCache class, which is cleared automatically every time when the sync occurs. I would prefer to keep this behavior. Otherwise it might cause problems when external repositories are renamed etc.

hmm if the external workspace root is renamed, the build dep would need to be renamed too right? eg, for some reason the namespace installed is changed in WORKSPACE from:

maven_install( name = "maven", artifacts = [ //artifacts ], repositories = [ "https://repo1.maven.org/maven2", ], )

to

maven_install( name = "changed_maven", artifacts = [ //artifacts ], repositories = [ "https://repo1.maven.org/maven2", ], )

Then wherever reference that namespace would need to be changed from

@maven//:artifact",

to

@changed_maven//:artifact",

right?

Actually i think busting the cache each time when resync is one of the reasons causing the IDE performance slow. But I could be missing some cases here.

My understanding is that the primary reason for the improvement was the cache previously being filled with all external repositories at once. Thanks to your change, it now fills lazily, one-by-one. The initial version of your PR also reused CacheSync, which cleared data during each resync. This approach seemed to be working well.

Additionally, it’s not just about renaming but also about cleaning the cache. It shouldn't grow uncontrollably.

@tpasternak reverted. Please take a look!

Hey, so after your change, we're putting non-qualified names into SyncCache again, which could cause conflicts if other services use the external repo name as keys. How about we try this approach instead? #6530 (comment)

Btw, the cache is not only used during sync, but whenever you click on a label in starlark code.

Updated. I actually thought abt doing this but it was bit strange to me that we could not utilize the blazeProjectData provided by the SyncCache and need to keep modifying the cache value(the map). But I understand your concern. Please take another look.

This reverts commit f1eac03.

* Improve performance when resolving the workspace root * fix test * check if in unit test mode * use a local map to cache the workspace root dirs * Revert "use a local map to cache the workspace root dirs" This reverts commit f1eac03. * cache the map to the syncCache * cleanup --------- Co-authored-by: Ivy Li <ili@snapchat.com>

Description from original PR (#6530): The original logic when resolving the dependency label, it loops through the entire {baze-base}/external dir, and check if the file is dir, then create a map for with the workspaceName as key and its dir path as value. Which when the external is really big the IO operation could hang there for a long time cause the IDE freeze. This change use the {baze-base}/external/{workspaceName} to construct the workspace root dir. And if it existed then construct the WorkspaceRoot and return. PiperOrigin-RevId: 679264404

Improve performance when resolving the workspace root

a76cae0

ivyspirit requested review from mai93, jastice, tpasternak and agluszak as code owners June 26, 2024 20:33

fix test

bc86e8c

ivyspirit closed this Jun 26, 2024

ivyspirit reopened this Jun 27, 2024

github-actions bot added the awaiting-review Awaiting review from Bazel team on PRs label Jun 27, 2024

sgowroji added product: IntelliJ IntelliJ plugin awaiting-user-response Awaiting response from author on PRs and removed awaiting-review Awaiting review from Bazel team on PRs labels Jun 27, 2024

sgowroji added awaiting-review Awaiting review from Bazel team on PRs and removed awaiting-user-response Awaiting response from author on PRs labels Jun 28, 2024

sgowroji assigned mai93 Jun 28, 2024

check if in unit test mode

eda0807

ivyspirit force-pushed the ivy-fork-freeze branch from 4b93e55 to eda0807 Compare June 28, 2024 23:45

tpasternak reviewed Jul 3, 2024

View reviewed changes

tpasternak reviewed Jul 4, 2024

View reviewed changes

use a local map to cache the workspace root dirs

f1eac03

tpasternak requested changes Jul 11, 2024

View reviewed changes

ilisc2 and others added 2 commits July 11, 2024 12:07

Revert "use a local map to cache the workspace root dirs"

7f44137

This reverts commit f1eac03.

cache the map to the syncCache

de7104e

cleanup

2f7cde4

tpasternak approved these changes Jul 16, 2024

View reviewed changes

tpasternak merged commit a04e9ab into bazelbuild:master Jul 16, 2024
6 checks passed

github-actions bot removed the awaiting-review Awaiting review from Bazel team on PRs label Jul 16, 2024

mai93 mentioned this pull request Jul 17, 2024

Update changelog for v2024.07.16 release #6570

Merged

ThomasCJY mentioned this pull request Sep 12, 2024

Android Studio UI Freezing while editing bazel build scripts #5719

Open

copybara-service bot mentioned this pull request Oct 1, 2024

Improve performance when resolving the workspace root #6820

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance when resolving the workspace root #6530

Improve performance when resolving the workspace root #6530

ivyspirit commented Jun 26, 2024 •

edited

Loading

ivyspirit commented Jun 26, 2024

ivyspirit commented Jun 27, 2024 •

edited

Loading

ivyspirit commented Jun 27, 2024

ivyspirit commented Jun 29, 2024 •

edited

Loading

mai93 commented Jul 2, 2024

tpasternak commented Jul 3, 2024

tpasternak left a comment •

edited

Loading

tpasternak Jul 3, 2024

ivyspirit Jul 3, 2024

tpasternak Jul 3, 2024

ilisc2 Jul 3, 2024 •

edited

Loading

tpasternak Jul 4, 2024 •

edited

Loading

ilisc2 Jul 8, 2024 •

edited

Loading

ilisc2 Jul 9, 2024

tpasternak Jul 3, 2024

ivyspirit Jul 3, 2024

tpasternak Jul 3, 2024

tpasternak left a comment •

edited

Loading

ilisc2 commented Jul 9, 2024 •

edited

Loading

tpasternak left a comment

tpasternak Jul 11, 2024

ilisc2 Jul 11, 2024

tpasternak Jul 11, 2024 •

edited

Loading

ilisc2 Jul 11, 2024

tpasternak Jul 11, 2024

ivyspirit Jul 11, 2024

tpasternak Jul 12, 2024

ilisc2 Jul 12, 2024

	if (blazeProjectData == null) {
	logger.debug("the blazeProjectData is null " + project.getName());
	return null;
	}

Improve performance when resolving the workspace root #6530

Improve performance when resolving the workspace root #6530

Conversation

ivyspirit commented Jun 26, 2024 • edited Loading

Checklist

Discussion thread for this change

Description of this change

ivyspirit commented Jun 26, 2024

ivyspirit commented Jun 27, 2024 • edited Loading

ivyspirit commented Jun 27, 2024

ivyspirit commented Jun 29, 2024 • edited Loading

mai93 commented Jul 2, 2024

tpasternak commented Jul 3, 2024

tpasternak left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilisc2 Jul 3, 2024 • edited Loading

Choose a reason for hiding this comment

tpasternak Jul 4, 2024 • edited Loading

Choose a reason for hiding this comment

ilisc2 Jul 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tpasternak left a comment • edited Loading

Choose a reason for hiding this comment

ilisc2 commented Jul 9, 2024 • edited Loading

tpasternak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tpasternak Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivyspirit commented Jun 26, 2024 •

edited

Loading

ivyspirit commented Jun 27, 2024 •

edited

Loading

ivyspirit commented Jun 29, 2024 •

edited

Loading

tpasternak left a comment •

edited

Loading

ilisc2 Jul 3, 2024 •

edited

Loading

tpasternak Jul 4, 2024 •

edited

Loading

ilisc2 Jul 8, 2024 •

edited

Loading

tpasternak left a comment •

edited

Loading

ilisc2 commented Jul 9, 2024 •

edited

Loading

tpasternak Jul 11, 2024 •

edited

Loading