Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Scalar (ported to C) into vfs-2.32.0 #366

Merged
merged 68 commits into from
Jun 9, 2021

Conversation

dscho
Copy link
Member

@dscho dscho commented Jun 7, 2021

Scalar is, in its own words, "an opinionated repository management tool". It builds on top of Git and aims to make it easy and effortless to work with large repositories.

Originally built using .NET, with the take-home lessons from VFS for Git, Scalar provides sort of a laboratory for experimenting with tactics and strategies to help Git scale better. Many recent scalability improvements in Git originate from Scalar, for example:

  • partial clone
  • sparse checkout (cone mode)
  • commit graphs
  • multi-pack indices
  • scheduled maintenance
  • prefetch
  • ...

While providing an experimentation lab outside of Git, the intention of the Scalar project always was to ship its improvements into core Git (i.e. to "upstream" them). As the list above demonstrates, it worked.

It worked so much that there are essentially only very few bits and pieces that are not (yet) upstreamed. The remaining parts fall roughly into these categories:

  • The scalar executable itself
  • The concept of an "enlistment", where the Git-tracked files live in the src/ subdirectory (which is the actual Git worktree), to encourage clear separation of tracked vs untracked files
  • A list of registered Scalar enlistments that is maintained independently from the list of Git repositories registered with git maintenance
  • A set of recommended config settings that get configured upon scalar clone or scalar register
  • Support for side-stepping the missing partial clone support in Azure Repos by using the GVFS protocol instead via the gvfs-helper

While the gvfs-helper part is very unlikely to ever make it into core Git, the remainder can easily be contributed in the form of contrib/scalar/.

This Pull Request adds these parts, in a neatly-structured thicket of topic branches, and it concludes the effort of three developers and almost two months.

dscho and others added 2 commits May 21, 2021 22:58
When two `git maintenance` processes try to write the `.plist` file, we
need to help them with serializing their efforts.

The 150ms time-out value was determined from thin air.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
On macOS, we use launchctl to manage the background maintenance
schedule. This uses a set of .plist files to describe the schedule, but
these files are also registered with 'launchctl bootstrap'. If multiple
'git maintenance start' commands run concurrently, then they can collide
replacing these schedule files and registering them with launchctl.

To avoid extra launchctl commands, do a check for the .plist files on
disk and check if they are registered using 'launchctl list <name>'.
This command will return with exit code 0 if it exists, or exit code 113
if it does not.

We can test this behavior using the GIT_TEST_MAINT_SCHEDULER environment
variable.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho
Copy link
Member Author

dscho commented Jun 7, 2021

This is the successor of #363, now with the correct target branch.

@derrickstolee derrickstolee self-requested a review June 7, 2021 13:18
dscho and others added 25 commits June 9, 2021 16:35
Over the course of Scalar's development, it became obvious that there is
a need for a command that can gather all kinds of useful information
that can help identify the most typical problems with large
worktrees/repositories.

The `diagnose` command is the culmination of this hard-won knowledge: it
gathers the installed hooks, the config, a couple statistics describing
the data shape, among other pieces of information, and then wraps
everything up in a tidy, neat `.zip` archive.

Note: in the .NET version we have the luxury of a comprehensive standard
library that includes basic functionality such as writing a `.zip` file.
In the C version, we lack such a commodity. Rather than introducing a
dependency on, say, libzip, we slightly abuse Git's `archive` command:
instead of writing the `.zip` file directly, we stage the file contents
in a Git index of a temporary, bare repository, only to let `git
archive` have at it, and finally removing the temporary repository.

Also note: Due to the frequent spawned `git hash-object` processes, this
command is quite a bit slow on Windows. Should it turn out to be a big
problem, the lack of a batch mode of the `hash-object` command could
potentially be worked around via using `git fast-import` with a crafted
`stdin`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Let's start implementing the `register` command. With this commit,
recommended settings are configured upon `scalar register`.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This implements Scalar's opinionated `clone` command: it tries to use a
partial clone and sets up a sparse checkout by default. In contrast to
`git clone`, `scalar clone` sets up the worktree in the `src/`
subdirectory, to encourage a separation between the source files and the
build output (which helps Git tremendously because it avoids untracked
files that have to be specifically ignored when refreshing the index).

Also, it registers the repository for regular, scheduled maintenance,
and configures a slur of configuration settings based on the experience
of the Microsoft Windows and the Microsoft Office development teams.

Note: We intentionally use a slightly wasteful `set_config()` function
(which does not reuse a single `strbuf`, for example, though performance
_really_ does not matter here) because it is very, very convenient.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Arguably, the biggest learning from the Scalar project is that scheduled
maintenance is crucial to keep large repositories in a good shape.

With this commit, `scalar register` starts those scheduled maintenance
tasks, and `scalar unregister` stops them.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This commit adds a simple regression test, modeled after Git's own
test suite.

A more comprehensive functional (or: integration) test suite can be
found at https://github.com/microsoft/scalar; There is no intention to
port that fuller test suite to `contrib/scalar/`; Instead, it will still
be used to verify the `scalar` functionality in Microsoft's Git fork.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This comes in handy during Scalar upgrades, or when config settings were
messed up by mistake.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Teach the `scalar diagnose` command to gather file size information
about pack files.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
With this patch, we start the journey from the C# project at
https://github.com/microsoft/scalar to move what is left to Git's own
`contrib/` directory.

The idea of Scalar, and before that VFS for Git, has always been to
prove that Git _can_ scale, and to upstream whatever strategies have
been demonstrated to help.

For example, while the virtual filesystem provided by VFS for Git helped
the team developing the Windows operating system to move onto Git, it is
not really an upstreamable strategy: getting it to work, and the
required server-side support, make this not quite feasible.

The Scalar project learned from that and tackled the problem with
different tactics: instead of pretending to Git that the working
directory is fully populated, it _specifically_ teaches Git about
partial clone (which is based on VFS for Git's cache server), about
sparse checkout (which VFS for Git tried to do transparently, in the
file system layer), and regularly runs maintenance tasks to keep the
repository in a healthy state.

With partial clone, sparse checkout and `git maintenance` having been
upstreamed, there is little left that `scalar.exe` does that which
`git.exe` cannot do. One such thing is that `scalar clone <url>` will
automatically set up a partial, sparse clone, and configure
known-helpful settings from the start.

Let's bring this convenience directly into Git's tree.

The idea here is that you can (optionally) build Scalar via

	make -C contrib/scalar/Makefile

This will build the `scalar` executable and put it into the
contrib/scalar/ subdirectory.

The slightly awkward addition of the `contrib/scalar/*` bits to the
top-level `Makefile` are actually really required: we want to link to
`libgit.a`, which means that we will need to use the very same `CFLAGS`
and `LDFLAGS` as the rest of Git.

An early development version of this patch tried to replicate the
respective conditionals in `contrib/scalar/Makefile` (just like
`contrib/svn-fe/Makefile` tried to do). It turned out to be quite the
whack-a-mole game: the SHA-1-related flags, the flags enabling/disabling
`compat/poll/`, `compat/regex/`, `compat/win32mmap.c` etc based on the
current platform... To put it mildly: it was a major mess.

Instead, this patch makes minimal changes to the top-level `Makefile` so
that the bits in `contrib/scalar/` can be compiled and linked, and
adds a `contrib/scalar/Makefile` that uses the top-level `Makefile` in a
most minimal way to do the actual compiling.

Note: With this commit, we only establish the infrastructure, no
Scalar functionality is implemented yet; We will do that incrementally
over the next few commits.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The list is simply those registered under the multi-valued scalar.repo
config setting.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
For example after a Scalar upgrade, it can come in really handy if there
is an easy way to reconfigure all Scalar enlistments. This new option
offers this functionality.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Teach the `scalar diagnose` command to gather loose object counts.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
... which does not do much, yet...

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When a user deleted an enlistment manually, let's be generous and
_still_ unregister it.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This is mostly just a shim for `git maintenance`, mapping task names
from the way Scalar called them to the way Git calls them.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The .NET version supported running `scalar config` to reconfigure the
current enlistment, and now the C port does, too.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This commit establishes the infrastructure to build the manual page for
te `scalar` command.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Let's populate the manual page of `scalar` a bit.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
With this commit, `git help scalar` will open the appropriate manual
or HTML page (instead of looking for `gitscalar`).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Using the built-in FSMonitor makes many common commands quite a bit
faster. So let's teach the `scalar register` command to enable the
built-in FSMonitor and kick-start the fsmonitor--daemon process (for
convenience).

For simplicity, we only support the built-in FSMonitor (and no external
file system monitor such as e.g. Watchman).

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Continuing the documentation journey.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In particular when multiple processes want to write to the config
simultaneously, it would come in handy to not fail immediately when
another process locked the config, but to gently try again.

This will help with Scalar's functional test suite which wants to
register multiple repositories for maintenance semi-simultaneously.

As not all code paths calling this function read the config (e.g. `git
config`), we have to read the config setting via
`git_config_get_ulong()`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
.github/workflows/scalar-functional-tests.yml Show resolved Hide resolved
.github/workflows/scalar-functional-tests.yml Show resolved Hide resolved
Comment on lines +693 to +710
if (!git_env_bool("SCALAR_TEST_SKIP_VSTS_INFO", 0) &&
can_url_support_gvfs(url)) {
cp.git_cmd = 1;
strvec_pushl(&cp.args, "gvfs-helper", "--remote", url,
"endpoint", "vsts/info", NULL);
if (!pipe_command(&cp, NULL, 0, &out, 512, NULL, 0)) {
char *id = NULL;
struct json_iterator it =
JSON_ITERATOR_INIT(out.buf, get_repository_id,
&id);

if (iterate_json(&it) < 0)
warning("JSON parse error (%s)", out.buf);
else if (id)
cache_key = xstrfmt("id_%s", id);
free(id);
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are skipping this entirely in our functional tests? That seems a bit questionable, since this is a critical path for scalar clone.

Is it just that vsts/info is prone to 500 errors or connection issues? Can we add retry logic here (or in git-gvfs-helper)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is tested in contrib/scalar/t/, by virtue of using test-gvfs-protocol as a mock server that does serve /vsts/info.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I believe.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I was wrong, test-gvfs-protocol does not serve that info.

But if we want to fix this, we will have to dig a lot deeper into the issue of authentication and how to provide it, because vsts/info requires authentication. Which is why it was never tested with Scalar.NET, so testing it would be something new.

Also, please note that the only time we use vsts/info is to get the repository ID. It is of course nice to have it working, still. And it did work in my manual tests...

Copy link
Collaborator

@derrickstolee derrickstolee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am approving because there seems to be a legit reason to not test the /vsts/info endpoint.

I'm excited to see this in the wild. We should first celebrate, and then discuss the strategy for the upstreaming process.

@dscho dscho merged commit 1969101 into microsoft:vfs-2.32.0 Jun 9, 2021
@dscho dscho deleted the features/scalar-2.32.0 branch June 9, 2021 21:30
derrickstolee pushed a commit that referenced this pull request Aug 3, 2021
Integrate Scalar (ported to C) into vfs-2.32.0
derrickstolee pushed a commit that referenced this pull request Aug 3, 2021
Integrate Scalar (ported to C) into vfs-2.32.0
derrickstolee pushed a commit that referenced this pull request Aug 5, 2021
Integrate Scalar (ported to C) into vfs-2.32.0
dscho added a commit that referenced this pull request Aug 5, 2021
Integrate Scalar (ported to C) into vfs-2.32.0
derrickstolee pushed a commit that referenced this pull request Aug 9, 2021
Integrate Scalar (ported to C) into vfs-2.32.0
derrickstolee pushed a commit that referenced this pull request Aug 12, 2021
Integrate Scalar (ported to C) into vfs-2.32.0
derrickstolee pushed a commit that referenced this pull request Aug 17, 2021
Integrate Scalar (ported to C) into vfs-2.32.0
dscho added a commit that referenced this pull request Oct 30, 2021
Integrate Scalar (ported to C) into vfs-2.32.0
derrickstolee pushed a commit that referenced this pull request Oct 30, 2021
Integrate Scalar (ported to C) into vfs-2.32.0
derrickstolee pushed a commit that referenced this pull request Oct 31, 2021
Integrate Scalar (ported to C) into vfs-2.32.0
derrickstolee pushed a commit that referenced this pull request Nov 4, 2021
Integrate Scalar (ported to C) into vfs-2.32.0
derrickstolee pushed a commit that referenced this pull request Nov 10, 2021
Integrate Scalar (ported to C) into vfs-2.32.0
derrickstolee pushed a commit that referenced this pull request Nov 15, 2021
Integrate Scalar (ported to C) into vfs-2.32.0
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 12, 2022
Integrate Scalar (ported to C) into vfs-2.32.0
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 19, 2022
Integrate Scalar (ported to C) into vfs-2.32.0
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 20, 2022
Integrate Scalar (ported to C) into vfs-2.32.0
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 20, 2022
Integrate Scalar (ported to C) into vfs-2.32.0
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 20, 2022
Integrate Scalar (ported to C) into vfs-2.32.0
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 20, 2022
Integrate Scalar (ported to C) into vfs-2.32.0
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 20, 2022
Integrate Scalar (ported to C) into vfs-2.32.0
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 20, 2022
Integrate Scalar (ported to C) into vfs-2.32.0
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 25, 2022
Integrate Scalar (ported to C) into vfs-2.32.0
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 25, 2022
Integrate Scalar (ported to C) into vfs-2.32.0
dscho added a commit that referenced this pull request Feb 1, 2022
Integrate Scalar (ported to C) into vfs-2.32.0
dscho added a commit that referenced this pull request Jun 17, 2022
Integrate Scalar (ported to C) into vfs-2.32.0
dscho added a commit that referenced this pull request Jun 17, 2022
Integrate Scalar (ported to C) into vfs-2.32.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants