Skip to content

Commit

Permalink
Merge branch 'scalar'
Browse files Browse the repository at this point in the history
This merges the upstreamable part of the Scalar patches.

Minor merge conflicts (caused by the gvfs-helper) were resolved
trivially.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
  • Loading branch information
dscho committed Sep 22, 2022
2 parents abdbce0 + a532143 commit cd9e584
Show file tree
Hide file tree
Showing 14 changed files with 384 additions and 11 deletions.
15 changes: 15 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,13 @@ jobs:
HOME: ${{runner.workspace}}
NO_PERL: 1
run: . /etc/profile && ci/make-test-artifacts.sh artifacts
- name: build Scalar
shell: bash
run: |
make -C contrib/scalar &&
mkdir -p artifacts/bin-wrappers artifacts/contrib/scalar &&
cp contrib/scalar/scalar.exe artifacts/contrib/scalar/ &&
cp bin-wrappers/scalar artifacts/bin-wrappers/
- name: zip up tracked files
run: git archive -o artifacts/tracked.tar.gz HEAD
- name: upload tracked files and build artifacts
Expand Down Expand Up @@ -160,6 +167,8 @@ jobs:
run: compat\vcbuild\vcpkg_copy_dlls.bat release ${{ matrix.arch }}-windows
- name: generate Visual Studio solution
shell: bash
env:
INCLUDE_SCALAR: YesPlease
run: |
cmake `pwd`/contrib/buildsystems/ -DCMAKE_PREFIX_PATH=`pwd`/compat/vcbuild/vcpkg/installed/${{ matrix.arch }}-windows \
-DNO_GETTEXT=YesPlease -DPERL_TESTS=OFF -DPYTHON_TESTS=OFF -DCURL_NO_CURL_CMAKE=ON -DCMAKE_GENERATOR_PLATFORM=${{ matrix.arch }} -DVCPKG_ARCH=${{ matrix.arch }}-windows -DHOST_CPU=${{ matrix.arch }}
Expand All @@ -173,6 +182,12 @@ jobs:
run: |
mkdir -p artifacts &&
eval "$(make -n artifacts-tar INCLUDE_DLLS_IN_ARTIFACTS=YesPlease ARTIFACTS_DIRECTORY=artifacts NO_GETTEXT=YesPlease 2>&1 | grep ^tar)"
- name: copy Scalar
shell: bash
run: |
mkdir -p artifacts/bin-wrappers artifacts/contrib/scalar &&
cp contrib/scalar/scalar.exe artifacts/contrib/scalar/ &&
cp bin-wrappers/scalar artifacts/bin-wrappers/
- name: zip up tracked files
run: git archive -o artifacts/tracked.tar.gz HEAD
- name: upload tracked files and build artifacts
Expand Down
9 changes: 9 additions & 0 deletions Documentation/config/core.txt
Original file line number Diff line number Diff line change
Expand Up @@ -806,3 +806,12 @@ core.abbrev::
If set to "no", no abbreviation is made and the object names
are shown in their full length.
The minimum length is 4.

core.configWriteLockTimeoutMS::
When processes try to write to the config concurrently, it is likely
that one process "wins" and the other process(es) fail to lock the
config file. By configuring a timeout larger than zero, Git can be
told to try to lock the config again a couple times within the
specified timeout. If the timeout is configure to zero (which is the
default), Git will fail immediately when the config is already
locked.
24 changes: 19 additions & 5 deletions builtin/gc.c
Original file line number Diff line number Diff line change
Expand Up @@ -1550,21 +1550,35 @@ static int maintenance_unregister(int argc, const char **argv, const char *prefi
struct option options[] = {
OPT_END(),
};
int rc;
const char *key = "maintenance.repo";
int rc = 0;
struct child_process config_unset = CHILD_PROCESS_INIT;
char *maintpath = get_maintpath();
int found = 0;
struct string_list_item *item;
const struct string_list *list = git_config_get_value_multi(key);

argc = parse_options(argc, argv, prefix, options,
builtin_maintenance_unregister_usage, 0);
if (argc)
usage_with_options(builtin_maintenance_unregister_usage,
options);

config_unset.git_cmd = 1;
strvec_pushl(&config_unset.args, "config", "--global", "--unset",
"--fixed-value", "maintenance.repo", maintpath, NULL);
for_each_string_list_item(item, list) {
if (!strcmp(maintpath, item->string)) {
found = 1;
break;
}
}

if (found) {
config_unset.git_cmd = 1;
strvec_pushl(&config_unset.args, "config", "--global", "--unset",
"--fixed-value", key, maintpath, NULL);

rc = run_command(&config_unset);
}

rc = run_command(&config_unset);
free(maintpath);
return rc;
}
Expand Down
2 changes: 2 additions & 0 deletions ci/run-build-and-tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -52,4 +52,6 @@ case " $MAKE_TARGETS " in
*" all "*) make -C contrib/subtree test;;
esac

make -C contrib/scalar $MAKE_TARGETS

save_good_tree
8 changes: 6 additions & 2 deletions ci/run-test-slice.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,11 @@ group "Run tests" make --quiet -C t T="$(cd t &&
tr '\n' ' ')" ||
handle_failed_tests

# Run the git subtree tests only if main tests succeeded
test 0 != "$1" || make -C contrib/subtree test
if test 0 = "$1"
then
# Run the git subtree & scalar tests only if main tests succeeded
make -C contrib/subtree test &&
make -C contrib/scalar test
fi

check_unignored_build_artifacts
8 changes: 7 additions & 1 deletion config.c
Original file line number Diff line number Diff line change
Expand Up @@ -3293,6 +3293,7 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
const char *value_pattern,
unsigned flags)
{
static unsigned long timeout_ms = ULONG_MAX;
int fd = -1, in_fd = -1;
int ret;
struct lock_file lock = LOCK_INIT;
Expand All @@ -3313,11 +3314,16 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
if (!config_filename)
config_filename = filename_buf = git_pathdup("config");

if ((long)timeout_ms < 0 &&
git_config_get_ulong("core.configWriteLockTimeoutMS", &timeout_ms))
timeout_ms = 0;

/*
* The lock serves a purpose in addition to locking: the new
* contents of .git/config will be written into it.
*/
fd = hold_lock_file_for_update(&lock, config_filename, 0);
fd = hold_lock_file_for_update_timeout(&lock, config_filename, 0,
timeout_ms);
if (fd < 0) {
error_errno(_("could not lock config file %s"), config_filename);
ret = CONFIG_NO_LOCK;
Expand Down
14 changes: 14 additions & 0 deletions contrib/buildsystems/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -816,6 +816,13 @@ if(CURL_FOUND)
target_link_libraries(git-gvfs-helper http_obj common-main ${CURL_LIBRARIES} )
endif()

if(DEFINED ENV{INCLUDE_SCALAR} AND NOT ENV{INCLUDE_SCALAR} STREQUAL "")
add_executable(scalar ${CMAKE_SOURCE_DIR}/contrib/scalar/scalar.c)
target_link_libraries(scalar common-main)
set_target_properties(scalar PROPERTIES RUNTIME_OUTPUT_DIRECTORY_DEBUG ${CMAKE_BINARY_DIR}/contrib/scalar)
set_target_properties(scalar PROPERTIES RUNTIME_OUTPUT_DIRECTORY_RELEASE ${CMAKE_BINARY_DIR}/contrib/scalar)
endif()

parse_makefile_for_executables(git_builtin_extra "BUILT_INS")

option(SKIP_DASHED_BUILT_INS "Skip hardlinking the dashed versions of the built-ins")
Expand Down Expand Up @@ -1057,6 +1064,13 @@ string(REPLACE "@@BUILD_DIR@@" "${CMAKE_BINARY_DIR}" content "${content}")
string(REPLACE "@@PROG@@" "git-cvsserver" content "${content}")
file(WRITE ${CMAKE_BINARY_DIR}/bin-wrappers/git-cvsserver ${content})

if(DEFINED ENV{INCLUDE_SCALAR} AND NOT ENV{INCLUDE_SCALAR} STREQUAL "")
file(STRINGS ${CMAKE_SOURCE_DIR}/wrap-for-bin.sh content NEWLINE_CONSUME)
string(REPLACE "@@BUILD_DIR@@" "${CMAKE_BINARY_DIR}" content "${content}")
string(REPLACE "@@PROG@@" "contrib/scalar/scalar${EXE_EXTENSION}" content "${content}")
file(WRITE ${CMAKE_BINARY_DIR}/bin-wrappers/scalar ${content})
endif()

#options for configuring test options
option(PERL_TESTS "Perform tests that use perl" ON)
option(PYTHON_TESTS "Perform tests that use python" ON)
Expand Down
51 changes: 51 additions & 0 deletions contrib/scalar/docs/faq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
Frequently Asked Questions
==========================

Using Scalar
------------

### I don't want a sparse clone, I want every file after I clone!

Run `scalar clone --full-clone <url>` to initialize your repo to include
every file. You can switch to a sparse-checkout later by running
`git sparse-checkout init --cone`.

### I already cloned without `--full-clone`. How do I get everything?

Run `git sparse-checkout disable`.

Scalar Design Decisions
-----------------------

There may be many design decisions within Scalar that are confusing at first
glance. Some of them may cause friction when you use Scalar with your existing
repos and existing habits.

> Scalar has the most benefit when users design repositories
> with efficient patterns.
For example: Scalar uses the sparse-checkout feature to limit the size of the
working directory within a large monorepo. It is designed to work efficiently
with monorepos that are highly componentized, allowing most developers to
need many fewer files in their daily work.

### Why does `scalar clone` create a `<repo>/src` folder?

Scalar uses a file system watcher to keep track of changes under this `src` folder.
Any activity in this folder is assumed to be important to Git operations. By
creating the `src` folder, we are making it easy for your build system to
create output folders outside the `src` directory. We commonly see systems
create folders for build outputs and package downloads. Scalar itself creates
these folders during its builds.

Your build system may create build artifacts such as `.obj` or `.lib` files
next to your source code. These are commonly "hidden" from Git using
`.gitignore` files. Having such artifacts in your source tree creates
additional work for Git because it needs to look at these files and match them
against the `.gitignore` patterns.

By following the `src` pattern Scalar tries to establish and placing your build
intermediates and outputs parallel with the `src` folder and not inside it,
you can help optimize Git command performance for developers in the repository
by limiting the number of files Git needs to consider for many common
operations.
98 changes: 98 additions & 0 deletions contrib/scalar/docs/getting-started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
Getting Started
===============

Registering existing Git repos
------------------------------

To add a repository to the list of registered repos, run `scalar register [<path>]`.
If `<path>` is not provided, then the "current repository" is discovered from
the working directory by scanning the parent paths for a path containing a `.git`
folder, possibly inside a `src` folder.

To see which repositories are currently tracked by the service, run
`scalar list`.

Run `scalar unregister [<path>]` to remove the repo from this list.

Creating a new Scalar clone
---------------------------------------------------

The `clone` verb creates a local enlistment of a remote repository using the
partial clone feature available e.g. on GitHub.


```
scalar clone [options] <url> [<dir>]
```

Create a local copy of the repository at `<url>`. If specified, create the `<dir>`
directory and place the repository there. Otherwise, the last section of the `<url>`
will be used for `<dir>`.

At the end, the repo is located at `<dir>/src`. By default, the sparse-checkout
feature is enabled and the only files present are those in the root of your
Git repository. Use `git sparse-checkout set` to expand the set of directories
you want to see, or `git sparse-checkout disable` to expand to all files. You
can explore the subdirectories outside your sparse-checkout specification using
`git ls-tree HEAD`.

### Sparse Repo Mode

By default, Scalar reduces your working directory to only the files at the
root of the repository. You need to add the folders you care about to build up
to your working set.

* `scalar clone <url>`
* Please choose the **Clone with HTTPS** option in the `Clone Repository` dialog in Azure Repos, not **Clone with SSH**.
* `cd <root>\src`
* At this point, your `src` directory only contains files that appear in your root
tree. No folders are populated.
* Set the directory list for your sparse-checkout using:
1. `git sparse-checkout set <dir1> <dir2> ...`
2. `git sparse-checkout set --stdin < dir-list.txt`
* Run git commands as you normally would.
* To fully populate your working directory, run `git sparse-checkout disable`.

If instead you want to start with all files on-disk, you can clone with the
`--full-clone` option. To enable sparse-checkout after the fact, run
`git sparse-checkout init --cone`. This will initialize your sparse-checkout
patterns to only match the files at root.

If you are unfamiliar with what directories are available in the repository,
then you can run `git ls-tree -d --name-only HEAD` to discover the directories
at root, or `git ls-tree -d --name-only HEAD <path>` to discover the directories
in `<path>`.

### Options

These options allow a user to customize their initial enlistment.

* `--full-clone`: If specified, do not initialize the sparse-checkout feature.
All files will be present in your `src` directory. This uses a Git partial
clone: blobs are downloaded on demand.

* `--branch=<ref>`: Specify the branch to checkout after clone.

### Advanced Options

The options below are not intended for use by a typical user. These are
usually used by build machines to create a temporary enlistment that
operates on a single commit.

* `--single-branch`: Use this option to only download metadata for the branch
that will be checked out. This is helpful for build machines that target
a remote with many branches. Any `git fetch` commands after the clone will
still ask for all branches.

* `--no-prefetch`: Use this option to not prefetch commits after clone. This
is not recommended for anyone planning to use their clone for history
traversal. Use of this option will make commands like `git log` or
`git pull` extremely slow and is therefore not recommended.

Removing a Scalar Clone
-----------------------

Since the `scalar clone` command sets up a file-system watcher (when available),
that watcher could prevent deleting the enlistment. Run `scalar delete <path>`
from outside of your enlistment to unregister the enlistment from the filesystem
watcher and delete the enlistment at `<path>`.
50 changes: 50 additions & 0 deletions contrib/scalar/docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
Scalar: Enabling Git at Scale
=============================

Scalar is a tool that helps Git scale to some of the largest Git repositories.
It achieves this by enabling some advanced Git features, such as:

* *Partial clone:* reduces time to get a working repository by not
downloading all Git objects right away.

* *Background prefetch:* downloads Git object data from all remotes every
hour, reducing the amount of time for foreground `git fetch` calls.

* *Sparse-checkout:* limits the size of your working directory.

* *File system monitor:* tracks the recently modified files and eliminates
the need for Git to scan the entire worktree.

* *Commit-graph:* accelerates commit walks and reachability calculations,
speeding up commands like `git log`.

* *Multi-pack-index:* enables fast object lookups across many pack-files.

* *Incremental repack:* Repacks the packed Git data into fewer pack-file
without disrupting concurrent commands by using the multi-pack-index.

By running `scalar register` in any Git repo, Scalar will automatically enable
these features for that repo (except partial clone) and start running suggested
maintenance in the background using
[the `git maintenance` feature](https://git-scm.com/docs/git-maintenance).

Repos cloned with the `scalar clone` command use partial clone to significantly
reduce the amount of data required to get started using a repository. By
delaying all blob downloads until they are required, Scalar allows you to work
with very large repositories quickly.

Documentation
-------------

* [Getting Started](getting-started.md): Get started with Scalar.
Includes `scalar register`, `scalar unregister`, `scalar clone`, and
`scalar delete`.

* [Troubleshooting](troubleshooting.md):
Collect diagnostic information or update custom settings. Includes
`scalar diagnose`.

* [The Philosophy of Scalar](philosophy.md): Why does Scalar work the way
it does, and how do we make decisions about its future?

* [Frequently Asked Questions](faq.md)
Loading

0 comments on commit cd9e584

Please sign in to comment.