Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(perf): lazy load workspace dependency #7352

Merged
merged 2 commits into from
Apr 9, 2024

Conversation

H4ad
Copy link
Contributor

@H4ad H4ad commented Apr 7, 2024

Lazy loading workspaces and glob dependencies, will avoid loading the dependency when running inside a repo that is not a workspace and also will not load glob when is not needed.

Before:

image

After:

image

@H4ad H4ad requested a review from a team as a code owner April 7, 2024 22:06
lib/utils/log-file.js Outdated Show resolved Hide resolved
@wraithgar
Copy link
Member

Test failures are gonna be real educational on this one, I can just feel it.

@H4ad
Copy link
Contributor Author

H4ad commented Apr 7, 2024

I tried to read why the tests are falling but I have no clue, I tried to roll back and looks like the map-workspaces is causing most of the failures.

@H4ad
Copy link
Contributor Author

H4ad commented Apr 7, 2024

Oh, I think I understand the issue, the @npmcli/config is loaded before require('graceful-fs').gracefulify(require('fs')) during npm.js, and the module path-curry doesn't work well with graceful-fs, if we remove graceful-fs, the tests work again.

The solution is:

  • try to find the reason why path-curry fails with graceful-fs
  • eager-load path-curry before graceful-fs, causes +4ms but saves 6ms.

@wraithgar
Copy link
Member

try to find the reason why path-curry fails with graceful-fs

@isaacs maintains both so maybe they know off the top of their head?

@wraithgar
Copy link
Member

I think 6ms may not be worth our trouble to have to start setting up a Jenga tower of correct require orders here. We may have to back off on that lazy loading.

@isaacs
Copy link
Contributor

isaacs commented Apr 7, 2024

I have no idea. If you have a repro that shows the two conflicting in some way, I can take a look, but path-scurry doesn't even do much with the fs, just reads from it. Maybe gfs is breaking some function signature path scurry is expecting? 🤷

@H4ad
Copy link
Contributor Author

H4ad commented Apr 7, 2024

setting up a Jenga tower of correct require orders here

We already have this jenga tower, if you move Config below the graceful-fs, the tests will break.

@isaacs Just clone this repo and checkout to this branch, and then run node test/lib/commands/doctor.js.

@H4ad H4ad force-pushed the perf/lazy-load-workspaces branch 2 times, most recently from f04de4c to 23d7b08 Compare April 7, 2024 23:52
@wraithgar
Copy link
Member

This repo no longer has that error. It may not be worth your time if it's not something you know off the top of your head @isaacs. Unsubscribe and move on, thanks for your quick response.

@H4ad
Copy link
Contributor Author

H4ad commented Apr 7, 2024

I didn't fix the issue with glob, I just removed the dependency.

@isaacs if you want to understand more, you can use this branch: https://github.com/H4ad/cli/tree/issue-with-graceful

@H4ad H4ad force-pushed the perf/lazy-load-workspaces branch from 23d7b08 to c8af873 Compare April 8, 2024 00:02
@H4ad
Copy link
Contributor Author

H4ad commented Apr 8, 2024

The tests are ok, just the coverage is not enough, right?

@H4ad H4ad force-pushed the perf/lazy-load-workspaces branch from c8af873 to 3734005 Compare April 8, 2024 00:15
@H4ad
Copy link
Contributor Author

H4ad commented Apr 8, 2024

@wraithgar On Node 18.x, we have a race condition while deleting the logs.

Since we did not wait to delete the logs, the readdir and rm didn't resolve and delete the files before the process exit.

If I replace both versions to use the sync version, then the tests on 18.20.0 were success.

What do you want to do in this case?

@wraithgar
Copy link
Member

I'm digging into this today but I do seem to remember this comment at the end of load() being relevant, and the way we addressed this bug in tests. npm itself doesn't block on that operation but we were in tests.

    // Return the result so it can be awaited in tests
    return this.#cleanLogs()

@wraithgar
Copy link
Member

It's entirely possible we have bugs here. I do know that sometimes my npm logs dir is filled with old logs that never get cleaned up, and haven't taken the time to triage that.

@wraithgar
Copy link
Member

It's entirely possible we have bugs here. I do know that sometimes my npm logs dir is filled with old logs that never get cleaned up, and haven't taken the time to triage that.

What I'm seeing is unrelated, we're deleting the newest logs not the oldest ones.

-      for (const file of files.slice(0, toDelete)) {
+      for (const file of files.slice(toDelete * -1)) {

That will fix the bug I'm seeing.

@H4ad
Copy link
Contributor Author

H4ad commented Apr 8, 2024

we're deleting the newest logs not the oldest ones.

The algorithm deletes the oldest ones, they are automatically ordered when they come from the readdir (at least on linux)

Also, the tests are passing, so I guess the deleting is right.

To fix, should I include await or make the code sync?

@lukekarrys
Copy link
Contributor

The original idea behind this method not containing any synchronous code is that it is all run as a best effort and we don't want to block anything. So we do await fs.rm for each file and if at any time during these operations, the CLI command finishes process.exit will be called explicitly to cancel any remaining fs operations.

So I do think we should keep all the log cleaning code using async methods.

@wraithgar
Copy link
Member

The algorithm deletes the oldest ones, they are automatically ordered when they come from the readdir (at least on linux)

They are not in osx. Looks like we have some cross-platform bugs. That's outside the scope of this PR but at least now I know exactly where that bug I was seeing lies.

So I do think we should keep all the log cleaning code using async methods.

Yes this post is 100% my understanding of the situation.

@wraithgar
Copy link
Member

Maybe I'm missing something, but the load function returns #cleanLogs which does await all of the rm operations. I still don't understand how the test are finishing early. It's npm itself that doesn't await the operations.

@H4ad
Copy link
Contributor Author

H4ad commented Apr 8, 2024

We do not wait in these lines:

cli/lib/npm.js

Lines 260 to 266 in 70497cb

this.time('npm:load:logFile', () => {
this.#logFile.load({
path: this.logPath,
logsMax: this.config.get('logs-max'),
})
log.verbose('logfile', this.#logFile.files[0] || 'no logfile created')
})

I suppose it was intentional

They are not in osx. Looks like we have some cross-platform bugs

I think a simple sort should fix these issues.

@wraithgar
Copy link
Member

wraithgar commented Apr 8, 2024

I think a simple sort should fix these issues.

glob used to sort, and it doesn't anymore. As part of our update to the latest version we manually sorted its results. Looks like we missed this one.

ETA: example of us preserving glob@8 behavior

let files = await glob(`${globify(docPath)}/*/*.md`)
// preserve glob@8 behavior
files = files.sort((a, b) => a.localeCompare(b, 'en'))

@wraithgar
Copy link
Member

I suppose it was intentional

Yes, we do not await any of this from npm itself. Our tests however are awaiting the load call, which returns a promise that blocks on all the rm operations for just this situation: testing.

@wraithgar
Copy link
Member

The fs.rm is actually failing in 18.x when I test locally.

The logFiles array is built and has entries that look like this:

/Users/wraithgar/Development/npm/cli/branches/main/test/lib/utils/tap-testdir-log-file-cleans-logs/2024-04-09T06_55_00_512Z-debug-0.log/2024-04-09T06_55_00_512Z-debug-0.log

Which as you can see is wrong, it doubles the filename for some reason. Node 20 doesn't do this. There is either a bug or a breaking change in fs.readdir and the code is written only to support the way node 20 returns.

This is a readdir response in node18

        Dirent {
          name: '2024-04-09T06_00_00_299Z-debug-0.log',
          path: '/Users/wraithgar/Development/npm/cli/branches/main/test/lib/utils/tap-testdir-log-file-cleans-logs',
          [Symbol(type)]: 1
        },

This is in node 20

        Dirent {
          name: '2024-04-09T06_01_00_861Z-debug-0.log',
          parentPath: '/Users/wraithgar/Development/npm/cli/branches/main/test/lib/utils/tap-testdir-log-file-cleans-logs',
          path: '/Users/wraithgar/Development/npm/cli/branches/main/test/lib/utils/tap-testdir-log-file-cleans-logs/2024-04-09T06_01_00_861Z-debug-0.log',
          [Symbol(type)]: 1
        },

Here's an isolated example of the difference:

~/D/n/s/readdir $ nvm use v20
Now using node v20.10.0 (npm v10.5.1)
~/D/n/s/readdir $ node -e "require('fs/promises').readdir('.', { withFileTypes: true, encoding: 'utf-8', }).then(console.log)"
[ Dirent { name: 'test.js', path: '.', [Symbol(type)]: 1 } ]
~/D/n/s/readdir $ nvm use v18
Now using node v18.20.1 (npm v10.5.1)
~/D/n/s/readdir $ node -e "require('fs/promises').readdir('.', { withFileTypes: true, encoding: 'utf-8', }).then(console.log)"
[
  Dirent {
    name: 'test.js',
    parentPath: '.',
    path: 'test.js',
    [Symbol(type)]: 1
  }
]

@wraithgar
Copy link
Member

withFiletypes had a breaking change between node 18 and node 20. If I omit it the responses are identical.

Now using node v18.20.1 (npm v10.5.1)
~/D/n/s/readdir $ node -e "require('fs/promises').readdir('.', { encoding: 'utf-8', }).then(console.log)"
[ 'test.js' ]
~/D/n/s/readdir $ nvm use v20
Now using node v20.10.0 (npm v10.5.1)
~/D/n/s/readdir $ node -e "require('fs/promises').readdir('.', { encoding: 'utf-8', }).then(console.log)"
[ 'test.js' ]

@wraithgar
Copy link
Member

This gets us green in node 18 and 20. If you can't get to this before tomorrow I'll make a new PR w/ this included so it can go out w/ tomorrow's release

~/D/n/c/b/main (perf/lazy-load-workspaces|✚1) $ git diff
diff --git a/lib/utils/log-file.js b/lib/utils/log-file.js
index 62b191a0d..1a46b7da0 100644
--- a/lib/utils/log-file.js
+++ b/lib/utils/log-file.js
@@ -204,11 +204,13 @@ class LogFiles {
         // counter suffix
         .replace('-.log', '')
 
-      const files = await fs.readdir(
+      let files = await fs.readdir(
         dirname(logPath), {
           withFileTypes: true,
           encoding: 'utf-8',
         })
+      files = files.sort((a, b) => basename(a.name).localeCompare(basename(b.name), 'en'))
+
       const logFiles = []
 
       for (const file of files) {
@@ -217,7 +219,7 @@ class LogFiles {
         }
 
         const genericFileName = file.name.replace(/\d/g, 'd')
-        const filePath = join(file.path, file.name)
+        const filePath = join(dirname(logPath), basename(file.name))

@H4ad H4ad force-pushed the perf/lazy-load-workspaces branch from 3734005 to 231585f Compare April 9, 2024 20:54
@H4ad H4ad force-pushed the perf/lazy-load-workspaces branch from 231585f to b36e51d Compare April 9, 2024 20:55
@H4ad
Copy link
Contributor Author

H4ad commented Apr 9, 2024

Thanks for the detailed investigation, as soon I get more time, I will try to investigate to see if this was an expected behavior on Node 18 or if it's a bug.

@wraithgar
Copy link
Member

Failures are in windows shims tests, unrelated to this PR. Those are known flaky tests.

@wraithgar wraithgar merged commit 5a28a29 into npm:latest Apr 9, 2024
36 checks passed
@wraithgar
Copy link
Member

PHEW, we got there in the end. Thanks for sticking it through @H4ad.

@H4ad
Copy link
Contributor Author

H4ad commented Apr 9, 2024

haha no worries, happy to help!

I also created an issue on node, this is actually a bug: nodejs/node#52441

@H4ad H4ad deleted the perf/lazy-load-workspaces branch April 9, 2024 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants