-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "pull --localcache-repo" #982
Conversation
Depends: #981 |
And oh yeah, this works really well. Check this out, as mentioned in the commit msg, I'm doing a fresh mirror of FAH but reusing my host (Atomic WS) repo:
26614 objects in common out of 28518, i.e. a 93% reuse! (Which you'd obviously kind of expect since they're built from the same RPMs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty cool!
src/libostree/ostree-repo-pull.c
Outdated
@@ -64,6 +64,7 @@ typedef struct { | |||
GPtrArray *meta_mirrorlist; /* List of base URIs for fetching metadata */ | |||
GPtrArray *content_mirrorlist; /* List of base URIs for fetching content */ | |||
OstreeRepo *remote_repo_local; | |||
GPtrArray *repo_refs; /* Array<OstreeRepo> */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit bikeshed, but can we call this the full repo_references
? (Or maybe even repo_local_references
). Given that "refs" already means something else in this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Or maybe skip the git naming and call it --localcache-repo
in the cmdline and localcache_repos
in variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that sounds much better.
src/libostree/ostree-repo-pull.c
Outdated
return FALSE; | ||
} | ||
else if (!g_hash_table_lookup (pull_data->requested_content, file_checksum)) | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this still be else if (!g_hash_table_lookup (pull_data->requested_content, file_checksum))
? If the request has already been made somehow, then we should commit to that, no?
src/libostree/ostree-repo-pull.c
Outdated
@@ -1523,6 +1556,30 @@ scan_one_metadata_object_c (OtPullData *pull_data, | |||
is_stored = TRUE; | |||
is_requested = TRUE; | |||
} | |||
/* Do we have any --reference local repos? */ | |||
else if (pull_data->repo_refs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be else if (!is_stored && pull_data->repo_refs)
?
https://bugzilla.gnome.org/show_bug.cgi?id=764735 is related here. |
In testing a bit more, one downside of this is that we're doing the reads/writes in the main thread. It'd be better to make this async. |
This is prep for a later patch; currently the logic is unchanged, but we'll need this if we make local imports async.
This is a lot like `git clone --reference`, but we chose "localcache" as the term "reference" is already used. The main use case I'm targeting this for is the Fedora Atomic Host installer case where we embed the repo content in the installer, but we may want to kickstart and download newer content. There, while we want to get a newer ref, we can still use the local repo as an object cache, since we have it sitting there in memory anyways. Another case is where one has a host ostree (say e.g. Fedora Atomic Workstation), and one wants to create a local archive mirror of FAH. Then one can use `pull --reference /ostree/repo` and pull the common objects (e.g. contents of `bash.rpm` etc.) Closes: ostreedev#975
2b9b5ee
to
a34ef4c
Compare
Rebased and comments addressed ⬆️ |
The async thing can come later. |
src/libostree/ostree-repo-pull.c
Outdated
@@ -1523,6 +1559,30 @@ scan_one_metadata_object_c (OtPullData *pull_data, | |||
is_stored = TRUE; | |||
is_requested = TRUE; | |||
} | |||
/* Do we have any --reference local repos? */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: outdated comment.
src/libostree/ostree-repo-pull.c
Outdated
return FALSE; | ||
is_stored = TRUE; | ||
is_requested = TRUE; | ||
pull_data->n_fetched_localcache_metadata++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to suggest adding:
if (objtype == OSTREE_OBJECT_TYPE_COMMIT)
g_hash_table_add (pull_data->fetched_detached_metadata, g_strdup (tmp_checksum));
here as well... but I think it's better without it, right?
It seems safer to always fetch it from the actual remote. Not to mention, we might not be embedding detached metadata in the local cache repo for some reason (like in the Fedora ISO).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are embedding it, but not necessarily for the to-be-fetched commit obviously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixup ⬇️
This is a lot like `git clone --reference`, but we chose "localcache" as the term "reference" is already used. The main use case I'm targeting this for is the Fedora Atomic Host installer case where we embed the repo content in the installer, but we may want to kickstart and download newer content. There, while we want to get a newer ref, we can still use the local repo as an object cache, since we have it sitting there in memory anyways. Another case is where one has a host ostree (say e.g. Fedora Atomic Workstation), and one wants to create a local archive mirror of FAH. Then one can use `pull --reference /ostree/repo` and pull the common objects (e.g. contents of `bash.rpm` etc.) Closes: #975 Closes: #982 Approved by: jlebon
☀️ Test successful - status-atomicjenkins |
This came up in <ostreedev#982>; when we added more direct local importing, we did it synchronously. This was actually quite a regression when doing local pulls between different modes; in particular between a bare mode and `archive`, as we were suddenly doing gzip {de,}compression in the main thread. Down the line actually...a simpler fix is probably to change things so that the local path is really only used when we know we can hardlink; everything else would go though the fetcher codepath but with `file://`. But this isn't a lot more code, and the speed/interactivity win is large. Note we're only doing content async with this patch. We could do metadata as well; we have the object already local. But the metadata code path is messier, and metadata objects are smaller. Another area where this comes up is that in e.g. Fedora releng, most operations talk to a NetApp via NFS. So this has the classic network filesystem problem that operations that are normally cheap like `stat()` can actually have nontrivial latency. Doing as much as possible in threads is better there too.
This came up in <#982>; when we added more direct local importing, we did it synchronously. This was actually quite a regression when doing local pulls between different modes; in particular between a bare mode and `archive`, as we were suddenly doing gzip {de,}compression in the main thread. Down the line actually...a simpler fix is probably to change things so that the local path is really only used when we know we can hardlink; everything else would go though the fetcher codepath but with `file://`. But this isn't a lot more code, and the speed/interactivity win is large. Note we're only doing content async with this patch. We could do metadata as well; we have the object already local. But the metadata code path is messier, and metadata objects are smaller. Another area where this comes up is that in e.g. Fedora releng, most operations talk to a NetApp via NFS. So this has the classic network filesystem problem that operations that are normally cheap like `stat()` can actually have nontrivial latency. Doing as much as possible in threads is better there too. Closes: #1006 Approved by: jlebon
The main use case I'm targeting this for is the Fedora Atomic Host installer
case where we embed the repo content in the installer, but we may want to
kickstart and download newer content. There, while we want to get a newer ref,
we can still use the local repo as an object cache, since we have it sitting
there in memory anyways.
Another case is where one has a host ostree (say e.g. Fedora Atomic
Workstation), and one wants to create a local archive mirror of FAH. Then one
can use
pull --reference /ostree/repo
and pull the common objects (e.g.contents of
bash.rpm
etc.)Closes: #975