Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: unflake async hooks statwatcher test #30362

Closed
wants to merge 1 commit into from

Conversation

lundibundi
Copy link
Member

@lundibundi lundibundi commented Nov 11, 2019

Under high load 2 types of issues arise with this test.

  • filesystem calls gets queued even when the 'sync' is used which leads
    to async_hooks being called with the events of tmpdir clean or
    initial file write after clean.
    This is solved by counting all 'change' calls while making sure there
    is no dependency of StatWatcher's on one another and the expected
    changes are waited for.

  • some events are getting lost with the current
    clean->write->write_and_watch strategy. Specifically I observed the
    file size going from 0 (empty) to 5 ('foo++') entirely skipping 3 ('foo') even though the write
    call was there (this happened reliably on -j128).
    So I've changed the strategy to avoid additional write considering
    this still tests the hooks correctly.
    This may indicate some sort of bug in async_hooks though I'm not sure.
    Not the async hooks but the fs watcher as it was missing an event.

With these changes, this test reliably passes with -j128 --repeat 4000 on my machine. Time to test this on CI.

Closes: #21425

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added async_hooks Issues and PRs related to the async hooks subsystem. test Issues and PRs related to the tests. labels Nov 11, 2019
@lundibundi
Copy link
Member Author

Could someone help with node-stress-single-test, what would be the correct GIT_REMOTE_REF for this PR? (refs/pull/30362/head?)
I wanted to run tests on win2016 and ubuntu1604x64 for async-hooks/test-statwatcher.

@Trott
Copy link
Member

Trott commented Nov 12, 2019

Thank you for this!!!!

refs/pull/30362/head should be the correct GIT_REMOTE_REF. If I did it right, it should be running at https://ci.nodejs.org/job/node-stress-single-test/15/.

@lundibundi
Copy link
Member Author

Well, looks like it was successful.
Though I'm still debating whether my w1HookCount/w2HookCount is a good solution to a different amount of events we get. In a current workflow, we can still either get no_file->file_write (99% of time) or no_file->file_creation->file_write (got this 3 times under -j128 --repeat 4000).

@Trott
Copy link
Member

Trott commented Nov 12, 2019

@nodejs/testing @nodejs/async_hooks This could use some reviews.

@lundibundi
Copy link
Member Author

ping @nodejs/testing @nodejs/async_hooks

@Trott
Copy link
Member

Trott commented Nov 27, 2019

@nodejs/collaborators

@nodejs-github-bot
Copy link
Collaborator

@Trott Trott added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Dec 21, 2019
@richardlau
Copy link
Member

If this unflakes the test shouldn’t it be removed from async-hooks.status?

Under high load 2 types of issues arise with this test.
* filesystem calls gets queued even when the 'sync' is used which leads
  to async_hooks being called with the events of tmpdir clean or
  initial file write after clean.
  This is solved by counting all 'change' calls while making sure there
  is no dependency of StatWatcher's on one another and the expected
  changes are waited for.

* some events are getting lost with the current
  clean->write->write_and_watch strategy. Specifically I observed the
  file size going from 0 to 5 entirely skipping 3 even though the write
  call was there (this happened reliably on -j128).
  So I've changed the strategy to avoid additional write considering
  this still tests the hooks correctly.
  This may indicate some sort of bug in async_hooks though I'm not sure.

Closes: nodejs#21425
@lundibundi lundibundi force-pushed the fix-stat-watcher-test branch from a4e124f to 33b8300 Compare December 21, 2019 12:02
@lundibundi
Copy link
Member Author

@richardlau done. I had to rebase (and force push) as this change wasn't present at the time.

@nodejs-github-bot
Copy link
Collaborator

nodejs-github-bot commented Dec 21, 2019

@nodejs-github-bot
Copy link
Collaborator

@Trott
Copy link
Member

Trott commented Dec 22, 2019

Landed in ea6a457

@Trott Trott closed this Dec 22, 2019
Trott pushed a commit that referenced this pull request Dec 22, 2019
Under high load 2 types of issues arise with this test.
* filesystem calls gets queued even when the 'sync' is used which leads
  to async_hooks being called with the events of tmpdir clean or
  initial file write after clean.
  This is solved by counting all 'change' calls while making sure there
  is no dependency of StatWatcher's on one another and the expected
  changes are waited for.

* some events are getting lost with the current
  clean->write->write_and_watch strategy. Specifically I observed the
  file size going from 0 to 5 entirely skipping 3 even though the write
  call was there (this happened reliably on -j128).
  So I've changed the strategy to avoid additional write considering
  this still tests the hooks correctly.
  This may indicate some sort of bug in async_hooks though I'm not sure.

Closes: #21425

PR-URL: #30362
Fixes: #21425
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Vladimir de Turckheim <vlad2t@hotmail.com>
BridgeAR pushed a commit that referenced this pull request Jan 3, 2020
Under high load 2 types of issues arise with this test.
* filesystem calls gets queued even when the 'sync' is used which leads
  to async_hooks being called with the events of tmpdir clean or
  initial file write after clean.
  This is solved by counting all 'change' calls while making sure there
  is no dependency of StatWatcher's on one another and the expected
  changes are waited for.

* some events are getting lost with the current
  clean->write->write_and_watch strategy. Specifically I observed the
  file size going from 0 to 5 entirely skipping 3 even though the write
  call was there (this happened reliably on -j128).
  So I've changed the strategy to avoid additional write considering
  this still tests the hooks correctly.
  This may indicate some sort of bug in async_hooks though I'm not sure.

Closes: #21425

PR-URL: #30362
Fixes: #21425
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Vladimir de Turckheim <vlad2t@hotmail.com>
@BridgeAR BridgeAR mentioned this pull request Jan 7, 2020
targos pushed a commit that referenced this pull request Jan 14, 2020
Under high load 2 types of issues arise with this test.
* filesystem calls gets queued even when the 'sync' is used which leads
  to async_hooks being called with the events of tmpdir clean or
  initial file write after clean.
  This is solved by counting all 'change' calls while making sure there
  is no dependency of StatWatcher's on one another and the expected
  changes are waited for.

* some events are getting lost with the current
  clean->write->write_and_watch strategy. Specifically I observed the
  file size going from 0 to 5 entirely skipping 3 even though the write
  call was there (this happened reliably on -j128).
  So I've changed the strategy to avoid additional write considering
  this still tests the hooks correctly.
  This may indicate some sort of bug in async_hooks though I'm not sure.

Closes: #21425

PR-URL: #30362
Fixes: #21425
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Vladimir de Turckheim <vlad2t@hotmail.com>
BethGriggs pushed a commit that referenced this pull request Feb 6, 2020
Under high load 2 types of issues arise with this test.
* filesystem calls gets queued even when the 'sync' is used which leads
  to async_hooks being called with the events of tmpdir clean or
  initial file write after clean.
  This is solved by counting all 'change' calls while making sure there
  is no dependency of StatWatcher's on one another and the expected
  changes are waited for.

* some events are getting lost with the current
  clean->write->write_and_watch strategy. Specifically I observed the
  file size going from 0 to 5 entirely skipping 3 even though the write
  call was there (this happened reliably on -j128).
  So I've changed the strategy to avoid additional write considering
  this still tests the hooks correctly.
  This may indicate some sort of bug in async_hooks though I'm not sure.

Closes: #21425

PR-URL: #30362
Fixes: #21425
Reviewed-By: Stephen Belanger <admin@stephenbelanger.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Reviewed-By: Vladimir de Turckheim <vlad2t@hotmail.com>
@MylesBorins MylesBorins mentioned this pull request Feb 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
async_hooks Issues and PRs related to the async hooks subsystem. author ready PRs that have at least one approval, no pending requests for changes, and a CI started. test Issues and PRs related to the tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate flaky async-hooks/test-statwatcher
6 participants