Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fsmonitor updates for improved performance #212

Merged
merged 3 commits into from
Nov 21, 2019

Conversation

kewillford
Copy link
Member

@kewillford kewillford commented Oct 22, 2019

This change does two main things.

  1. Adds a check for the CE_FSMONITOR_VALID flag in the ce_uptodate macro so that whenever the code is checking if any entry is up to date the fsmonitor flag will be taken into consideration.
  2. When unpacking trees keep the fsmonitor data so that the next command does not have to pay the price to check all the entries.

@kewillford kewillford force-pushed the test_status_perf branch 5 times, most recently from 7b90325 to 2da1e29 Compare October 30, 2019 17:42
@kewillford kewillford changed the title [WIP] status after unpack_trees fsmonitor updates for improved performance Oct 31, 2019
@kewillford kewillford marked this pull request as ready for review October 31, 2019 17:40
@kewillford
Copy link
Member Author

Here are some of the performance differences with these changes

command Previous Duration Current Duration Seconds difference percent change
checkout after folderplaceholder enumeration 17.40225 3.21683 -14.18542 -81.51486
status 1 after checkout feature/gvfs/perftest/defaultBranch 15.53006 3.03953 -12.49053 -80.42809
status 1 after checkout -b user/me/topic feature/gvfs/perftest/defaultBranch~1 15.66574 3.09206 -12.57368 -80.26228
status 1 after pull --ff-only origin feature/gvfs/perftest/checkoutBranch 16.08161 3.62207 -12.45954 -77.47694
status 1 after checkout -b pullTest feature/gvfs/perftest/checkoutBranch~10 15.61599 3.63645 -11.97954 -76.71329
status 1 after merge feature/gvfs/perftest/defaultBranch --no-commit 13.1538 3.17194 -9.98186 -75.88575
git reset --hard HEAD 26.76172 6.68994 -20.07178 -75.00183
status 1 after reset --hard HEAD 12.11019 3.15032 -8.95987 -73.98621
status 1 after checkout feature/gvfs/perftest/checkoutBranch 17.4508 4.88049 -12.57031 -72.03286
status 1 after stash 10.89549 3.09768 -7.79781 -71.56915
git add --all 30.97013 10.36075 -20.60938 -66.54599
status 2 after First ReadFiles 4.35711 1.60531 -2.7518 -63.15654
git stash 31.56066 11.86821 -19.69245 -62.39556
git merge feature/gvfs/perftest/defaultBranch --no-commit 21.15363 8.60551 -12.54812 -59.31899
git stash pop 19.23136 9.56369 -9.66767 -50.27034
git rebase feature/gvfs/perftest/defaultBranch 26.31549 13.19811 -13.11738 -49.84661
status 1 after merge --abort 14.68546 7.55865 -7.12681 -48.5297

@derrickstolee
Copy link
Collaborator

We need to figure something out about how fsmonitor talks specifically to watchman. We are not robust to script-level frequency (my test is on v2.24.0-rc2).

The GIT_TEST_FSMONITOR environment variable can take a hook path, and there is an included hook for Watchman. There are numerous issues with this integration on Linux (we cannot delete repos after registering them with Watchman, so that causes many test failures), but also even the simple test_commit function doesn't work!

For example:

GIT_TRACE=1 GIT_TEST_FSMONITOR="$(pwd)/t7519/fsmonitor-watchman" ./t7060-wtstatus.sh -x -v -d -i

In test 5, the following commands are run in order:

		test_commit initial foo "" &&
		test_commit modify foo foo &&

and here are the logs for those two lines:

+ test_commit initial foo 
+ notick=
+ signoff=
+ indir=
+ test 3 != 0
+ break
+ indir=
+ file=foo
+ echo 
+ git add foo
trace: built-in: git add foo
trace: run_command: cd '/_git/git/t/trash directory.t7060-wtstatus/mdconflict'; /_git/git/t/t7519/fsmonitor-watchman 1 1572618499310242307
Adding '/_git/git/t/trash directory.t7060-wtstatus/mdconflict' to watchman's watch list.
+ test -z 
+ test_tick
+ test -z set
+ test_tick=1112912173
+ GIT_COMMITTER_DATE=1112912173 -0700
+ GIT_AUTHOR_DATE=1112912173 -0700
+ export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
+ git commit -m initial
trace: built-in: git commit -m initial
trace: run_command: cd '/_git/git/t/trash directory.t7060-wtstatus/mdconflict'; /_git/git/t/t7519/fsmonitor-watchman 1 1572618499374543700
trace: run_command: git gc --auto
trace: built-in: git gc --auto
[master (root-commit) a3c5375] initial
 Author: A U Thor <author@example.com>
 1 file changed, 1 insertion(+)
 create mode 100644 foo
+ git tag initial
trace: built-in: git tag initial
+ test_commit modify foo foo
+ notick=
+ signoff=
+ indir=
+ test 3 != 0
+ break
+ indir=
+ file=foo
+ echo foo
+ git add foo
trace: built-in: git add foo
trace: run_command: cd '/_git/git/t/trash directory.t7060-wtstatus/mdconflict'; /_git/git/t/t7519/fsmonitor-watchman 1 1572618499425203804
+ test -z 
+ test_tick
+ test -z set
+ test_tick=1112912233
+ GIT_COMMITTER_DATE=1112912233 -0700
+ GIT_AUTHOR_DATE=1112912233 -0700
+ export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
+ git commit -m modify
trace: built-in: git commit -m modify
trace: run_command: cd '/_git/git/t/trash directory.t7060-wtstatus/mdconflict'; /_git/git/t/t7519/fsmonitor-watchman 1 1572618499517349606
On branch master
nothing to commit, working tree clean

The second git commit -m modify fails because the second git add foo did nothing. It triggered the watchman call, but did not receive a change for that file, so did not update the index. (I verified this by adding an extra git status call in the test_commit code and re-running the test.)

We will need to think about how to make our Watchman integration more robust and set up some automation to run the test suite with Watchman specifically.

cc: @kewillford, @jrbriggs, @wilbaker, @dscho

@jeffhostetler
Copy link

jeffhostetler commented Nov 1, 2019

@derrickstolee Can you run this with GIT_TRACE_FSMONITOR turned on too? Not sure if that'll help or not, but worth a shot.

D'oh, I just noticed that you did have that turned on in the above command line. It there anything in that other log file?

Copy link
Member

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

I asked for a few clarifications, and suggest to split the change that copies the last_update when copying an index into its own commit, but none of these are super-important.

But please add your sign-off to the commit messages.

t/t7113-post-index-change-hook.sh Outdated Show resolved Hide resolved
t/t7519-status-fsmonitor.sh Show resolved Hide resolved
@@ -164,6 +166,8 @@ EOF

# test that newly added files are marked valid
test_expect_success 'newly added files are marked valid' '
write_script .git/hooks/fsmonitor-test<<-\EOF &&
EOF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I do not understand this diff hunk. previously, we let the test fsmonitor tell Git to look at the new files, but now we don't?

Is this a diff hunk that is not exactly necessary for the test case to pass, but merely to make the test more accurate, by disallowing fsmonitor to trigger lstat()s on the three new files and instead forcing it to rely on the implicit information given by the git add` commands?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. The issue is that with these changes the fsmonitor data is refreshed when any command reads the index. Left unchanged the git ls-files would refresh and be using the dirty files from the previous test which would include the newly added files and they would be marked as dirty which would fail this test. Perhaps I can use the test-tool dump-fsmonitor to get the bitmap to compare against instead of using ls-files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up question here: what behavior is this test specifically trying to validate?

The comment says:

test that newly added files are marked valid

But based on the change you made it seems like newly added files will not always be marked as valid (i.e. if they were dirty before they will still be considered dirty after the git add).

Is there there something I'm missing (e.g. was the test .git/hooks/fsmonitor-test behaving incorrectly, and that's why it needs to be set to empty at the start of the test)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for this test the

write_script .git/hooks/fsmonitor-test<<-\EOF &&
EOF

needs to be before the git ls-files so that the git add runs with the paths in .git/hooks/fsmonitor-test but for each git add the single path for the added file would need to be in .git/hooks/fsmonitor-test otherwise the last git add would mark the other paths as dirty and save that index out.

I'm taking a closer look at the tests because before the content of the .git/hooks/fsmonitor-test did not affect commands that were not refreshing cache entries whereas now any commands that reads the index will get the entries marked dirty that are in the .git/hooks/fsmonitor-test. This means that even git ls-files which is being used to validate the CE_FSMONITOR_VALID could have that affected by what is in .git/hooks/fsmonitor-test. So if left with all the paths in .git/hooks/fsmonitor-test, those paths will be dirty for every git command that reads the index.

So in most cases we need to make sure .git/hooks/fsmonitor-test is empty for any validating commands like git ls-files so that the contents of that file will not change what cache entries are dirty.

I might also try using the test-dump-fsmonitor since that it only dumping the index entries before applying the bitmap or the .git/hooks/fsmonitor-test paths.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or each git add the single path for the added file would need to be in .git/hooks/fsmonitor-test otherwise the last git add would mark the other paths as dirty and save that index out.

Ahh, this is the part I was missing, thanks!

dirty_repo &&
git add . &&
write_script .git/hooks/fsmonitor-test<<-\EOF &&
EOF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically the change in 'newly added files are marked valid' made it so that the fsmonitor is not allowed to tell Git to look at any file's stat in all the test cases up until here. Hmm. I would really like to have at least a paragraph in the commit message providing a compelling argument why that is a good thing.

And then I really don't understand why we have to have the full fsmonitor-test script again, but only for dirty_repo and for git add .?

unpack-trees.c Show resolved Hide resolved
fsmonitor.c Show resolved Hide resolved
fsmonitor.c Show resolved Hide resolved
fsmonitor.c Show resolved Hide resolved
Copy link
Member

@wilbaker wilbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, same questions as @dscho on the updates to the tests.

@kewillford kewillford force-pushed the test_status_perf branch 2 times, most recently from 4539b52 to 843cf19 Compare November 6, 2019 15:43
@derrickstolee
Copy link
Collaborator

@kewillford when you get this rebased on top of features/sparse-checkout-2.24.0 then I can launch the C# tests using watchman for additional confidence.

@derrickstolee derrickstolee changed the base branch from features/sparse-checkout-2.23.0 to features/sparse-checkout-2.24.0 November 6, 2019 18:58
@derrickstolee derrickstolee changed the base branch from features/sparse-checkout-2.24.0 to vfs-2.24.0 November 12, 2019 18:54
@derrickstolee
Copy link
Collaborator

@kewillford I kicked off a new build for you here.

3444ec2 ("fsmonitor: don't fill bitmap with entries to be removed",
2019-10-11) added a handful of sanity checks that make sure that a
bit position in fsmonitor bitmap does not go beyond the end of the
index.  As each bit in the bitmap corresponds to a path in the
index, this is the right check most of the time.

Except for the case when we are in the split-index mode and looking
at a delta index that is to be overlayed on the base index but
before the base index has actually been merged in, namely in read_
and write_fsmonitor_extension().  In these codepaths, the entries in
the split/delta index is typically a small subset of the entire set
of paths (otherwise why would we be using split-index?), so the
bitmap used by the fsmonitor is almost always larger than the number
of entries in the partial index, and the incorrect comparison would
trigger the BUG().

Signed-off-by: Kevin Willford <Kevin.Willford@microsoft.com>
When using fsmonitor the CE_FSMONITOR_VALID flag should be checked when
wanting to know if the entry has been updated. If the flag is set the
entry should be considered up to date and the same as if the CE_UPTODATE
is set.

In order to trust the CE_FSMONITOR_VALID flag, the fsmonitor data needs to
be refreshed when the fsmonitor bitmap is applied to the index in
tweak_fsmonitor. Since the fsmonitor data is kept up to date for every
command, some tests needed to be updated to take that into account.

istate->untracked->use_fsmonitor was set in tweak_fsmonitor when the
fsmonitor bitmap data was loaded and is now in refresh_fsmonitor since
that is being called in tweak_fsmonitor. refresh_fsmonitor will only be
called once and any other callers should be setting it when refreshing
the fsmonitor data so that code can use the fsmonitor data when checking
untracked files.

When writing the index, fsmonitor_last_update is used to determine if
the fsmonitor bitmap should be created and the extension data written to
the index. When running through unpack-trees this is not copied to the
result index. This makes the next time a git command is ran do all the
work of lstating all files to determine what is clean since all entries
in the index are marked as dirty since there wasn't any fsmonitor data
saved in the index extension.

Copying the fsmonitor_last_update to the result index will cause the
extension data for fsmonitor to be in the index for the next git command
to use.

Signed-off-by: Kevin Willford <Kevin.Willford@microsoft.com>
The fsmonitor script that can be used for running all the git tests
using watchman was causing some of the tests to fail because it wrote
to stderr and created some files for debugging purposes.

Add a new debug script to use with debugging and modify the other script
to remove the code that would cause tests to fail.

Signed-off-by: Kevin Willford <Kevin.Willford@microsoft.com>
dscho added a commit that referenced this pull request Jul 7, 2023
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
vdye pushed a commit that referenced this pull request Jul 19, 2023
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request Aug 8, 2023
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request Aug 8, 2023
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request Aug 11, 2023
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
jeffhostetler pushed a commit that referenced this pull request Aug 23, 2023
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request Nov 3, 2023
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request Nov 3, 2023
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request Nov 3, 2023
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request Nov 8, 2023
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request Nov 14, 2023
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request Nov 20, 2023
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
vdye pushed a commit that referenced this pull request Feb 27, 2024
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request Apr 23, 2024
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request Apr 23, 2024
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request Apr 24, 2024
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request Apr 29, 2024
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request May 14, 2024
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request May 14, 2024
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request Jun 3, 2024
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request Jul 17, 2024
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request Jul 17, 2024
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request Jul 17, 2024
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request Jul 18, 2024
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
mjcheetham pushed a commit that referenced this pull request Jul 23, 2024
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request Jul 25, 2024
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
mjcheetham pushed a commit that referenced this pull request Jul 29, 2024
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request Sep 18, 2024
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request Sep 24, 2024
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
dscho added a commit that referenced this pull request Oct 8, 2024
Includes these pull requests:

	#1
	#6
	#10
	#11
	#157
	#212
	#260
	#270

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants