-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance when resolving the workspace root #6530
Conversation
Fixing the test |
@sgowroji do you have any suggestion on this? in the test it create a |
4b93e55
to
eda0807
Compare
@mai93 i have fixed the tests. All of them are due to the mock set up. Please take a look. Thanks! |
LGTM from me, @tpasternak if you have time can you review this? |
yep, I'll try it later today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed a really good finding. Thank you for the contribution! I just left some comments, but please don't consider it a finalized review, so you can hold on with fixing that. I'd like to try it out a little more tomorrow
return ImmutableMap.of(); | ||
logger.debug("getExternalWorkspaceRootsFile for " + workspaceName); | ||
File externalBase = SyncCache.getInstance(project) | ||
.get(workspaceName, (Project theProject, BlazeProjectData projectData) -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I have no idea what are the consequences of this, all other usages of SyncCache are hardcoded 😅 It's a global storage so might cause conflicts. I would prefer to keep the data structure (map?) under the ProjectHelper
key and update it on demand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the workspaceName
as key to store the workspaceRoot
path. It might be a bit overkill to use it. I also think I could just keep a synchronized map instance in this class. But since the workspaceRoot map before was saved to the SyncCache so I went for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, but there was a whole map in a single entry for a key named WorkspaceHelper.class
while after your change there are N entries there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the prev implementation cache the entire map with the WorkspaceHelper class name. The map is huge with tens of thousands of entries, most of them is not useful. Because when you use the rule_jvm_external pinned version, all the external dependencies got downloaded directly into the external
dir. The map will cache all the external dependencies dir.
With the name, it will only cache the workspaceName that you installed with the rules_jvm_external
, with its path. for example android_mvn
, test_mvn
... It is not going to be lots of them. I can do an inspection and past the cache counts here, for our case. Considering we've already have lots of different namespaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, SyncCache had a static number of entries, typically one per purpose, with hardcoded keys in the plugin's codebase. With your change, the cache becomes dynamic, and the number of entries varies.
The risk is that external workspace names are flat-structured. If another service follows your pattern, conflicts may arise. Other services might also need to store external workspace-related data and should reserve their own keys.
I'm not suggesting we revert to the old map system, but we should keep data within a map, not at the top level of sync-cache.
cc @ujohnny
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I can create a map and cache it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the impl to use a local map to cache the workspace root dirs. Using the SyncCache
does not make sense if we want to cache an map and consistently modifying the entries of the map. This should also fixed the issue you saw below when the init bazelProjectData
is null? The workspace root will only be cached if the value if the dir existed, means the bazelProjectData
is not null. Once the project is done initialization if you do the query again the value should returned as expected. @tpasternak
|
||
Path relativePath = bazelRootPath.relativize(path); | ||
if (relativePath.getNameCount() > 0) { | ||
String firstFolder = relativePath.getName(0).toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, but this might be conflicting with --experimental_sibling_repository_layout
.
Apart from that it seems that bazel allows external
directory name in the source root. We probably need to handle this case, too. But the old algorithm deosn't seem to support it, too, so we probably shouldn't care
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the old logic assume everything is under external
, if not things will break i think. But I can check that case. Would it be ok to do a followup PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if old code looks the same then yes, no need to fix it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I found this bug:
- Import https://github.com/bazelbuild/bazel
- Open the top-level BUILD file
Ok, this probably happens, because when you run sync, the blazeProjectData
entry might be null so the null is written to the cache
intellij/base/src/com/google/idea/blaze/base/sync/workspace/WorkspaceHelper.java
Lines 200 to 203 in eda0807
if (blazeProjectData == null) { | |
logger.debug("the blazeProjectData is null " + project.getName()); | |
return null; | |
} |
The local map cache should help on this. although before the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the last issue, sorry for such a ping pong.
By the way I just noticed that the current solution, as well as the previous one doesn't work with bzlmod, where the paths do have <reponame>~<something
stem, but that's another story
File[] children = provider.listFiles(getExternalSourceRoot(blazeProjectData)); | ||
if (children == null) { | ||
return ImmutableMap.of(); | ||
if (!workspaceRootCache.containsKey(workspaceName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a primary use case for SyncCache, but I think it's still a good idea to keep it. Otherwise (what happens now) the cache is not cleared on resync. How about this way? Sorry for yet another round but i think it could lead to some bugs
@@ -201,7 +202,7 @@
if (bazelProjectData == null) {
return null;
}
-
+ var workspaceRootCache = SyncCache.getInstance(project).get(WorkspaceHelper.class, (p, data) -> new ConcurrentHashMap<String, WorkspaceRoot>());
if (!workspaceRootCache.containsKey(workspaceName)) {
File externalBase = new File(bazelProjectData.getBlazeInfo().getOutputBase(),
"external/" + workspaceName);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate more on the resync case? you mean within one user session when user click on the sync? for that case we don't want to clear the cache right? The external workspace dirs are not going to change. We do want to keep them right? Just trying to understand the case of the cache lifecycle. I thought the cache should stay as long as the WorkspaceHelper
instance stays. @tpasternak
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean this data's lifetime was previously managed by SyncCache class, which is cleared automatically every time when the sync occurs. I would prefer to keep this behavior. Otherwise it might cause problems when external repositories are renamed etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm if the external workspace root is renamed, the build dep would need to be renamed too right? eg, for some reason the namespace installed is changed in WORKSPACE from:
maven_install(
name = "maven",
artifacts = [
//artifacts
],
repositories = [
"https://repo1.maven.org/maven2",
],
)
to
maven_install(
name = "changed_maven",
artifacts = [
//artifacts
],
repositories = [
"https://repo1.maven.org/maven2",
],
)
Then wherever reference that namespace would need to be changed from
@maven//:artifact",
to
@changed_maven//:artifact",
right?
Actually i think busting the cache each time when resync is one of the reasons causing the IDE performance slow. But I could be missing some cases here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the primary reason for the improvement was the cache previously being filled with all external repositories at once. Thanks to your change, it now fills lazily, one-by-one. The initial version of your PR also reused CacheSync, which cleared data during each resync. This approach seemed to be working well.
Additionally, it’s not just about renaming but also about cleaning the cache. It shouldn't grow uncontrollably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tpasternak reverted. Please take a look!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, so after your change, we're putting non-qualified names into SyncCache again, which could cause conflicts if other services use the external repo name as keys. How about we try this approach instead? #6530 (comment)
Btw, the cache is not only used during sync, but whenever you click on a label in starlark code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. I actually thought abt doing this but it was bit strange to me that we could not utilize the blazeProjectData
provided by the SyncCache
and need to keep modifying the cache value(the map). But I understand your concern. Please take another look.
* Improve performance when resolving the workspace root * fix test * check if in unit test mode * use a local map to cache the workspace root dirs * Revert "use a local map to cache the workspace root dirs" This reverts commit f1eac03. * cache the map to the syncCache * cleanup --------- Co-authored-by: Ivy Li <ili@snapchat.com>
Description from original PR (#6530): The original logic when resolving the dependency label, it loops through the entire {baze-base}/external dir, and check if the file is dir, then create a map for with the workspaceName as key and its dir path as value. Which when the external is really big the IO operation could hang there for a long time cause the IDE freeze. This change use the {baze-base}/external/{workspaceName} to construct the workspace root dir. And if it existed then construct the WorkspaceRoot and return. PiperOrigin-RevId: 679264404
Checklist
Please note that the maintainers will not be reviewing this change until all checkboxes are ticked. See
the Contributions section in the README for more
details.
Discussion thread for this change
Issue number: 5719
Description of this change
We have pretty big bazel project, and we are using the
rule_jvm_external
pinned feature, which download all the external dependencies to the{baze-base}/external
dir. Theexternal
folder could grow very big. One example the smallest cache folder for us:The original logic when resolving the dependency label, it loops through the entire
{baze-base}/external
dir, and check if the file is dir, then create a map for with the workspaceName as key and its dir path as value. Which when the external is really big the IO operation could hang there for a long time cause the IDE freeze.This change use the {baze-base}/external/{workspaceName} to construct the workspace root dir. And if it existed then construct the
WorkspaceRoot
and return.