-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate Scalar (ported to C) into vfs-2.32.0 #366
Conversation
When two `git maintenance` processes try to write the `.plist` file, we need to help them with serializing their efforts. The 150ms time-out value was determined from thin air. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
On macOS, we use launchctl to manage the background maintenance schedule. This uses a set of .plist files to describe the schedule, but these files are also registered with 'launchctl bootstrap'. If multiple 'git maintenance start' commands run concurrently, then they can collide replacing these schedule files and registering them with launchctl. To avoid extra launchctl commands, do a check for the .plist files on disk and check if they are registered using 'launchctl list <name>'. This command will return with exit code 0 if it exists, or exit code 113 if it does not. We can test this behavior using the GIT_TEST_MAINT_SCHEDULER environment variable. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This is the successor of #363, now with the correct target branch. |
2677735
to
2da5eab
Compare
Over the course of Scalar's development, it became obvious that there is a need for a command that can gather all kinds of useful information that can help identify the most typical problems with large worktrees/repositories. The `diagnose` command is the culmination of this hard-won knowledge: it gathers the installed hooks, the config, a couple statistics describing the data shape, among other pieces of information, and then wraps everything up in a tidy, neat `.zip` archive. Note: in the .NET version we have the luxury of a comprehensive standard library that includes basic functionality such as writing a `.zip` file. In the C version, we lack such a commodity. Rather than introducing a dependency on, say, libzip, we slightly abuse Git's `archive` command: instead of writing the `.zip` file directly, we stage the file contents in a Git index of a temporary, bare repository, only to let `git archive` have at it, and finally removing the temporary repository. Also note: Due to the frequent spawned `git hash-object` processes, this command is quite a bit slow on Windows. Should it turn out to be a big problem, the lack of a batch mode of the `hash-object` command could potentially be worked around via using `git fast-import` with a crafted `stdin`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Let's start implementing the `register` command. With this commit, recommended settings are configured upon `scalar register`. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This implements Scalar's opinionated `clone` command: it tries to use a partial clone and sets up a sparse checkout by default. In contrast to `git clone`, `scalar clone` sets up the worktree in the `src/` subdirectory, to encourage a separation between the source files and the build output (which helps Git tremendously because it avoids untracked files that have to be specifically ignored when refreshing the index). Also, it registers the repository for regular, scheduled maintenance, and configures a slur of configuration settings based on the experience of the Microsoft Windows and the Microsoft Office development teams. Note: We intentionally use a slightly wasteful `set_config()` function (which does not reuse a single `strbuf`, for example, though performance _really_ does not matter here) because it is very, very convenient. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Arguably, the biggest learning from the Scalar project is that scheduled maintenance is crucial to keep large repositories in a good shape. With this commit, `scalar register` starts those scheduled maintenance tasks, and `scalar unregister` stops them. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This commit adds a simple regression test, modeled after Git's own test suite. A more comprehensive functional (or: integration) test suite can be found at https://github.com/microsoft/scalar; There is no intention to port that fuller test suite to `contrib/scalar/`; Instead, it will still be used to verify the `scalar` functionality in Microsoft's Git fork. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This comes in handy during Scalar upgrades, or when config settings were messed up by mistake. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Teach the `scalar diagnose` command to gather file size information about pack files. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
With this patch, we start the journey from the C# project at https://github.com/microsoft/scalar to move what is left to Git's own `contrib/` directory. The idea of Scalar, and before that VFS for Git, has always been to prove that Git _can_ scale, and to upstream whatever strategies have been demonstrated to help. For example, while the virtual filesystem provided by VFS for Git helped the team developing the Windows operating system to move onto Git, it is not really an upstreamable strategy: getting it to work, and the required server-side support, make this not quite feasible. The Scalar project learned from that and tackled the problem with different tactics: instead of pretending to Git that the working directory is fully populated, it _specifically_ teaches Git about partial clone (which is based on VFS for Git's cache server), about sparse checkout (which VFS for Git tried to do transparently, in the file system layer), and regularly runs maintenance tasks to keep the repository in a healthy state. With partial clone, sparse checkout and `git maintenance` having been upstreamed, there is little left that `scalar.exe` does that which `git.exe` cannot do. One such thing is that `scalar clone <url>` will automatically set up a partial, sparse clone, and configure known-helpful settings from the start. Let's bring this convenience directly into Git's tree. The idea here is that you can (optionally) build Scalar via make -C contrib/scalar/Makefile This will build the `scalar` executable and put it into the contrib/scalar/ subdirectory. The slightly awkward addition of the `contrib/scalar/*` bits to the top-level `Makefile` are actually really required: we want to link to `libgit.a`, which means that we will need to use the very same `CFLAGS` and `LDFLAGS` as the rest of Git. An early development version of this patch tried to replicate the respective conditionals in `contrib/scalar/Makefile` (just like `contrib/svn-fe/Makefile` tried to do). It turned out to be quite the whack-a-mole game: the SHA-1-related flags, the flags enabling/disabling `compat/poll/`, `compat/regex/`, `compat/win32mmap.c` etc based on the current platform... To put it mildly: it was a major mess. Instead, this patch makes minimal changes to the top-level `Makefile` so that the bits in `contrib/scalar/` can be compiled and linked, and adds a `contrib/scalar/Makefile` that uses the top-level `Makefile` in a most minimal way to do the actual compiling. Note: With this commit, we only establish the infrastructure, no Scalar functionality is implemented yet; We will do that incrementally over the next few commits. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The list is simply those registered under the multi-valued scalar.repo config setting. Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
For example after a Scalar upgrade, it can come in really handy if there is an easy way to reconfigure all Scalar enlistments. This new option offers this functionality. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Teach the `scalar diagnose` command to gather loose object counts. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
... which does not do much, yet... Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When a user deleted an enlistment manually, let's be generous and _still_ unregister it. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This is mostly just a shim for `git maintenance`, mapping task names from the way Scalar called them to the way Git calls them. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The .NET version supported running `scalar config` to reconfigure the current enlistment, and now the C port does, too. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This commit establishes the infrastructure to build the manual page for te `scalar` command. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Let's populate the manual page of `scalar` a bit. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
With this commit, `git help scalar` will open the appropriate manual or HTML page (instead of looking for `gitscalar`). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Using the built-in FSMonitor makes many common commands quite a bit faster. So let's teach the `scalar register` command to enable the built-in FSMonitor and kick-start the fsmonitor--daemon process (for convenience). For simplicity, we only support the built-in FSMonitor (and no external file system monitor such as e.g. Watchman). Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Continuing the documentation journey. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In particular when multiple processes want to write to the config simultaneously, it would come in handy to not fail immediately when another process locked the config, but to gently try again. This will help with Scalar's functional test suite which wants to register multiple repositories for maintenance semi-simultaneously. As not all code paths calling this function read the config (e.g. `git config`), we have to read the config setting via `git_config_get_ulong()`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
if (!git_env_bool("SCALAR_TEST_SKIP_VSTS_INFO", 0) && | ||
can_url_support_gvfs(url)) { | ||
cp.git_cmd = 1; | ||
strvec_pushl(&cp.args, "gvfs-helper", "--remote", url, | ||
"endpoint", "vsts/info", NULL); | ||
if (!pipe_command(&cp, NULL, 0, &out, 512, NULL, 0)) { | ||
char *id = NULL; | ||
struct json_iterator it = | ||
JSON_ITERATOR_INIT(out.buf, get_repository_id, | ||
&id); | ||
|
||
if (iterate_json(&it) < 0) | ||
warning("JSON parse error (%s)", out.buf); | ||
else if (id) | ||
cache_key = xstrfmt("id_%s", id); | ||
free(id); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we are skipping this entirely in our functional tests? That seems a bit questionable, since this is a critical path for scalar clone
.
Is it just that vsts/info
is prone to 500 errors or connection issues? Can we add retry logic here (or in git-gvfs-helper
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is tested in contrib/scalar/t/
, by virtue of using test-gvfs-protocol
as a mock server that does serve /vsts/info
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I believe.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I was wrong, test-gvfs-protocol
does not serve that info.
But if we want to fix this, we will have to dig a lot deeper into the issue of authentication and how to provide it, because vsts/info
requires authentication. Which is why it was never tested with Scalar.NET, so testing it would be something new.
Also, please note that the only time we use vsts/info
is to get the repository ID. It is of course nice to have it working, still. And it did work in my manual tests...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am approving because there seems to be a legit reason to not test the /vsts/info
endpoint.
I'm excited to see this in the wild. We should first celebrate, and then discuss the strategy for the upstreaming process.
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Integrate Scalar (ported to C) into vfs-2.32.0
Scalar is, in its own words, "an opinionated repository management tool". It builds on top of Git and aims to make it easy and effortless to work with large repositories.
Originally built using .NET, with the take-home lessons from VFS for Git, Scalar provides sort of a laboratory for experimenting with tactics and strategies to help Git scale better. Many recent scalability improvements in Git originate from Scalar, for example:
While providing an experimentation lab outside of Git, the intention of the Scalar project always was to ship its improvements into core Git (i.e. to "upstream" them). As the list above demonstrates, it worked.
It worked so much that there are essentially only very few bits and pieces that are not (yet) upstreamed. The remaining parts fall roughly into these categories:
scalar
executable itselfsrc/
subdirectory (which is the actual Git worktree), to encourage clear separation of tracked vs untracked filesgit maintenance
scalar clone
orscalar register
gvfs-helper
While the
gvfs-helper
part is very unlikely to ever make it into core Git, the remainder can easily be contributed in the form ofcontrib/scalar/
.This Pull Request adds these parts, in a neatly-structured thicket of topic branches, and it concludes the effort of three developers and almost two months.