Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

searching for source files taking many seconds #1282

Closed
jordangarside opened this issue May 13, 2021 · 18 comments
Closed

searching for source files taking many seconds #1282

jordangarside opened this issue May 13, 2021 · 18 comments
Labels
fixed in next version (main) A fix has been implemented and will appear in an upcoming version

Comments

@jordangarside
Copy link

jordangarside commented May 13, 2021

I apologize if this has already been asked, but I couldn't quickly find a clear similar issue.

In the monorepo at Robinhood I'm experiencing extended time searching for source files.
It seems like there are too many files (~44k), but I couldn't seem to find a setting to exclude paths from analysis.

Basically this searching for source files will block for ~5+ seconds on every analyze causing autocomplete and go to definitions to be extremely slow.

Python Version: 3.8.6
VSCode Version: 1.56.0
Python Extension: 2021.5.829140558
Pylance Version: 2021.5.2

Settings:

    "python.analysis.autoSearchPaths": false,
    "python.analysis.diagnosticMode": "openFilesOnly"

Examples:

[Info  - 2:08:06 PM] Searching for source files
[Info  - 2:08:12 PM] Found 44179 source files
[Info  - 2:13:47 PM] Searching for source files
[Info  - 2:13:52 PM] Found 44179 source files
[Info  - 2:13:52 PM] Searching for source files
[Info  - 2:13:58 PM] Found 44179 source files
@jakebailey
Copy link
Member

jakebailey commented May 13, 2021

Somewhat similar to #1281; can you see if there are any directories in your workspace that need to be excluded? See that thread for a config.

I'm curious what paths we are including that we shouldn't be, and if there's something we've missed.

@jordangarside
Copy link
Author

Awesome this worked great thanks!

Filtering out directories with "include"/"exclude" in pyrightconfig.json led to finding out that this is due to the .git directory for us, simply excluding it fixes our issue.

Not clear to me why including it continuously leads to:
Background analysis message: markAllFilesDirty

I've disabled all other plugins as well while test this.
image

@jakebailey
Copy link
Member

Hmm, .git should already be excluded in the default configuration, so I'm not quite sure how explicitly including it fixes things. Very strange.

markAllFilesDirty means that some event happened that we believe invalidates the analysis, so if you're seeing that a lot then that may be another problem (some file watching of the workspace that we are now not properly ignoring, maybe?)

Logs are always helpful.

@jordangarside
Copy link
Author

jordangarside commented May 13, 2021

Ohh it seems I mispoke:

Including everything but the .git works, but the inverse does not (setting it as the only exclude).

i.e.
This works:

{
    "include": [
        // every file and folder except .git
        // ...
    ],
}

This doesn't work:

{
    "exclude": [
        ".git"
    ]
}

Logs without .git in "include" list:

Background analysis message: setFileOpened
Background analysis message: markFilesDirty
Background analysis message: setFileOpened
Background analysis message: markFilesDirty
Background analysis message: getSemanticTokens delta
[BG(1)] getSemanticTokens delta previousResultId:1620945259287 at /monorepo/project/project/app.py ...
[BG(1)]   parsing: /monorepo/project/project/app.py (1ms)
[BG(1)]   binding: /monorepo/project/project/app.py (0ms)
[BG(1)] getSemanticTokens delta previousResultId:1620945259287 at /monorepo/project/project/app.py (67ms)
Background analysis message: setFileOpened
Background analysis message: markFilesDirty
Background analysis message: getDiagnosticsForRange
Background analysis message: getDiagnosticsForRange

Logs with .git in "include" list (happening in the background without touching the file):

[Info  - 3:36:58 PM] Searching for source files
[Info  - 3:37:02 PM] Found 29446 source files
Background analysis message: setTrackedFiles
Background analysis message: markAllFilesDirty
Background analysis message: analyze
Background analysis message: invalidateAndForceReanalysis
Background analysis message: invalidateAndForceReanalysis
[Info  - 3:37:02 PM] Searching for source files
[Info  - 3:37:06 PM] Found 29446 source files
Background analysis message: setTrackedFiles
Background analysis message: markAllFilesDirty
Background analysis message: analyze
Background analysis message: invalidateAndForceReanalysis
Background analysis message: invalidateAndForceReanalysis

It appears something in vscode is touching my .git directory very often (probably causing that markAllFilesDirty). My .git dir is only updating while vscode is in focus. Also it doesn't look like any of the files in the .git directory are actually changing.

@jakebailey jakebailey reopened this May 13, 2021
@jakebailey
Copy link
Member

Going to reopen this; I think we need to verify that our file watching filter is actually ignoring .git changes.

I also misspoke; invalidateAndForceReanalysis is the "scary" one that really invalidates the world. The other calls aren't as scary.

@erictraut
Copy link
Contributor

".git" is not a folder or a python file, so adding it to "exclude" should have no effect.

@bschnurr
Copy link
Member

turning on pyright's verboseOuput might shed some light on which files are being touched.

{
    "verboseOutput": true
}

@jordangarside
Copy link
Author

jordangarside commented May 13, 2021

Yup that's helpful, this is when I include .git:

[Info  - 4:36:23 PM] SourceFile: Received fs event 'add' for path '/monorepo/.git/.watchman-cookie-jordan-garside--...'
[Info  - 4:36:23 PM] SourceFile: Received fs event 'change' for path '/monorepo/.git/.watchman-cookie-jordan-garside--...'
Background analysis message: markAllFilesDirty
Background analysis message: analyze
...
[Info  - 4:36:23 PM] Searching for source files

@jakebailey
Copy link
Member

jakebailey commented May 13, 2021

Yeah, seems like there's some tool that is plopping something in .git and triggering FS events. Our current check for this should really be ignoring any event that contains /.git/ (+/- path separator normalization), but instead is checking that the path ends with .git (or, both). https://github.com/microsoft/pyright/blob/7bb059ecbab5c0c446d4dcf5376fc5ce8bd8cd26/packages/pyright-internal/src/analyzer/service.ts#L1130

I will send a fix for the next release.

@jordangarside
Copy link
Author

Thanks everyone!

@jakebailey
Copy link
Member

Out of curiosity, what was the file extension for that watchman file? I think we should end up ignoring non-.py/.pyi events, but it may be the case that we mis-treat this event as a folder change.

@jordangarside
Copy link
Author

Out of curiosity, what was the file extension for that watchman file? I think we should end up ignoring non-.py/.pyi events, but it may be the case that we mis-treat this event as a folder change.

There is no extension on the watchman file.

@jakebailey
Copy link
Member

Thanks. The logic here is due for a re-look, but file watching is temperamental and hard to screw with without breaking someone.

@jordangarside
Copy link
Author

It seems like there isn't a way to easily exclude it in the pyrightconfig.json without fully listing out the "include" section:

{
    "exclude": [
        "**/.watchman*"
    ]
}

^ This still leads to:

[Info  - 4:52:10 PM] SourceFile: Received fs event 'add' for path '/monorepo/.git/.watchman-cookie-jordan-garside--...'

@jordangarside
Copy link
Author

Maybe I'll just try this in the near-term:

{
    "include": [
        "**/*.py",
        "**/*.pyi"
    ]
}

@jakebailey
Copy link
Member

The FS events are separate from exclude, as "exclude" only excludes the file from the source file scanning at startup. It doesn't exclude it from file watching or exclude it from being importable.

The fix I'm applying will still cause the log to happen in verbose mode (it is verbose mode), but the event will be ignored.

@jakebailey jakebailey added the fixed in next version (main) A fix has been implemented and will appear in an upcoming version label May 14, 2021
@github-actions github-actions bot removed the triage label May 14, 2021
@jakebailey
Copy link
Member

I've made the above change to suppress these .git events; it should be available in the release next week.

@jakebailey
Copy link
Member

This issue has been fixed in version 2021.5.3, which we've just released. You can find the changelog here: https://github.com/microsoft/pylance-release/blob/main/CHANGELOG.md#202153-19-may-2021

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixed in next version (main) A fix has been implemented and will appear in an upcoming version
Projects
None yet
Development

No branches or pull requests

4 participants