Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "pull --localcache-repo" #982

Closed
wants to merge 4 commits into from

Conversation

cgwalters
Copy link
Member

The main use case I'm targeting this for is the Fedora Atomic Host installer
case where we embed the repo content in the installer, but we may want to
kickstart and download newer content. There, while we want to get a newer ref,
we can still use the local repo as an object cache, since we have it sitting
there in memory anyways.

Another case is where one has a host ostree (say e.g. Fedora Atomic
Workstation), and one wants to create a local archive mirror of FAH. Then one
can use pull --reference /ostree/repo and pull the common objects (e.g.
contents of bash.rpm etc.)

Closes: #975

@cgwalters
Copy link
Member Author

Depends: #981

@cgwalters
Copy link
Member Author

cgwalters commented Jun 29, 2017

And oh yeah, this works really well. Check this out, as mentioned in the commit msg, I'm doing a fresh mirror of FAH but reusing my host (Atomic WS) repo:

[root@localhost repomirror]# ostree --repo=repo-test-reference pull --mirror -R /srv/host/ostree fah-25 fedora-atomic/25/x86_64/docker-host



GPG: Verification enabled, found 1 signature:

  Signature made Sun 25 Jun 2017 09:08:34 PM UTC using RSA key ID 4089D8F2FDB19C98
  Good signature from "Fedora 25 Primary <fedora-25-primary@fedoraproject.org>"

459 metadata, 1445 content objects fetched (6433 meta, 20181 content local); 204282 KiB transferred in 114 seconds

26614 objects in common out of 28518, i.e. a 93% reuse!

(Which you'd obviously kind of expect since they're built from the same RPMs)

Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty cool!

@@ -64,6 +64,7 @@ typedef struct {
GPtrArray *meta_mirrorlist; /* List of base URIs for fetching metadata */
GPtrArray *content_mirrorlist; /* List of base URIs for fetching content */
OstreeRepo *remote_repo_local;
GPtrArray *repo_refs; /* Array<OstreeRepo> */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit bikeshed, but can we call this the full repo_references? (Or maybe even repo_local_references). Given that "refs" already means something else in this code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Or maybe skip the git naming and call it --localcache-repo in the cmdline and localcache_repos in variables?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sounds much better.

return FALSE;
}
else if (!g_hash_table_lookup (pull_data->requested_content, file_checksum))
else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this still be else if (!g_hash_table_lookup (pull_data->requested_content, file_checksum))? If the request has already been made somehow, then we should commit to that, no?

@@ -1523,6 +1556,30 @@ scan_one_metadata_object_c (OtPullData *pull_data,
is_stored = TRUE;
is_requested = TRUE;
}
/* Do we have any --reference local repos? */
else if (pull_data->repo_refs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be else if (!is_stored && pull_data->repo_refs) ?

@cgwalters
Copy link
Member Author

https://bugzilla.gnome.org/show_bug.cgi?id=764735 is related here.

@cgwalters
Copy link
Member Author

In testing a bit more, one downside of this is that we're doing the reads/writes in the main thread. It'd be better to make this async.

This is prep for a later patch; currently the logic is unchanged, but we'll need
this if we make local imports async.
This is a lot like `git clone --reference`, but we chose "localcache" as the
term "reference" is already used.

The main use case I'm targeting this for is the Fedora Atomic Host installer
case where we embed the repo content in the installer, but we may want to
kickstart and download newer content. There, while we want to get a newer ref,
we can still use the local repo as an object cache, since we have it sitting
there in memory anyways.

Another case is where one has a host ostree (say e.g. Fedora Atomic
Workstation), and one wants to create a local archive mirror of FAH. Then one
can use `pull --reference /ostree/repo` and pull the common objects (e.g.
contents of `bash.rpm` etc.)

Closes: ostreedev#975
@cgwalters
Copy link
Member Author

Rebased and comments addressed ⬆️

@cgwalters cgwalters changed the title Add "pull --reference" Add "pull --localcache-repo" Jun 29, 2017
@cgwalters
Copy link
Member Author

The async thing can come later.

@@ -1523,6 +1559,30 @@ scan_one_metadata_object_c (OtPullData *pull_data,
is_stored = TRUE;
is_requested = TRUE;
}
/* Do we have any --reference local repos? */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: outdated comment.

return FALSE;
is_stored = TRUE;
is_requested = TRUE;
pull_data->n_fetched_localcache_metadata++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to suggest adding:

if (objtype == OSTREE_OBJECT_TYPE_COMMIT)
  g_hash_table_add (pull_data->fetched_detached_metadata, g_strdup (tmp_checksum));

here as well... but I think it's better without it, right?

It seems safer to always fetch it from the actual remote. Not to mention, we might not be embedding detached metadata in the local cache repo for some reason (like in the Fedora ISO).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are embedding it, but not necessarily for the to-be-fetched commit obviously.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixup ⬇️

@jlebon
Copy link
Member

jlebon commented Jun 30, 2017

@rh-atomic-bot r+ a39f35e

@rh-atomic-bot
Copy link

⌛ Testing commit a39f35e with merge 4273e67...

rh-atomic-bot pushed a commit that referenced this pull request Jun 30, 2017
This is a lot like `git clone --reference`, but we chose "localcache" as the
term "reference" is already used.

The main use case I'm targeting this for is the Fedora Atomic Host installer
case where we embed the repo content in the installer, but we may want to
kickstart and download newer content. There, while we want to get a newer ref,
we can still use the local repo as an object cache, since we have it sitting
there in memory anyways.

Another case is where one has a host ostree (say e.g. Fedora Atomic
Workstation), and one wants to create a local archive mirror of FAH. Then one
can use `pull --reference /ostree/repo` and pull the common objects (e.g.
contents of `bash.rpm` etc.)

Closes: #975

Closes: #982
Approved by: jlebon
@rh-atomic-bot
Copy link

☀️ Test successful - status-atomicjenkins
Approved by: jlebon
Pushing 4273e67 to master...

cgwalters added a commit to cgwalters/ostree that referenced this pull request Jul 14, 2017
This came up in <ostreedev#982>; when
we added more direct local importing, we did it synchronously.

This was actually quite a regression when doing local pulls between different
modes; in particular between a bare mode and `archive`, as we were suddenly
doing gzip {de,}compression in the main thread.

Down the line actually...a simpler fix is probably to change things so that the
local path is really only used when we know we can hardlink; everything else
would go though the fetcher codepath but with `file://`.

But this isn't a lot more code, and the speed/interactivity win is large.

Note we're only doing content async with this patch. We could do metadata as
well; we have the object already local. But the metadata code path is messier,
and metadata objects are smaller.

Another area where this comes up is that in e.g. Fedora releng, most operations
talk to a NetApp via NFS. So this has the classic network filesystem problem
that operations that are normally cheap like `stat()` can actually have
nontrivial latency. Doing as much as possible in threads is better there too.
rh-atomic-bot pushed a commit that referenced this pull request Jul 18, 2017
This came up in <#982>; when
we added more direct local importing, we did it synchronously.

This was actually quite a regression when doing local pulls between different
modes; in particular between a bare mode and `archive`, as we were suddenly
doing gzip {de,}compression in the main thread.

Down the line actually...a simpler fix is probably to change things so that the
local path is really only used when we know we can hardlink; everything else
would go though the fetcher codepath but with `file://`.

But this isn't a lot more code, and the speed/interactivity win is large.

Note we're only doing content async with this patch. We could do metadata as
well; we have the object already local. But the metadata code path is messier,
and metadata objects are smaller.

Another area where this comes up is that in e.g. Fedora releng, most operations
talk to a NetApp via NFS. So this has the classic network filesystem problem
that operations that are normally cheap like `stat()` can actually have
nontrivial latency. Doing as much as possible in threads is better there too.

Closes: #1006
Approved by: jlebon
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants