-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming static deltas #1309
Streaming static deltas #1309
Conversation
This is similar to GVariantBuilder in that it constructs variant containers, but it writes it directly to a file descriptor rather than keep the entier thing in memory. This is useful to create large variants without using a lot of memory.
This allows us to create the final delta desciptor directly on disk rather than having it all in memory. This is nice because it can become quite large if inlined parts are used. Note however, that we currently generate all the delta parts in memory before adding them to the delta, so we still keep all individual parts in memory. Fixing that is the next step.
We will do some changes later that need these earliers, so move them up.
Directly when we allocate a new part we finish the old one, writing the compressed data to a temporary file and generating the delta header for it. When all these are done we loop over them and collect the headers, sizes and either copy the tempfile data into the inlined superblock or link the tempfiles to disk with the proper names.
Note, the deltas produced were identical, sans the timestamp. |
This will help flatpak/flatpak#483 |
Gah, it failed because of the permissions on the superblock. Honestly glnx_link_tmpfile_at() should probably take a mode argument to avoid issues like this in the future. |
It is used, by unref:ing it when we go out of scope. @cgwalters Do you know how to fix this warning? |
Finally it passed CI :) |
}; | ||
|
||
static GRecMutex g_variant_type_info_lock; | ||
static GHashTable *g_variant_type_info_table; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm...won't this possibly break if there are two copies of this in process? (BTW last I measured this type info table was part of what made GVariant slow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hesistant a bit to do this without at least opening up a GLib bug and tracking this API addition there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would it break? Its jus a local static variable which protects a hashtable. It doesn't affect anything outside this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the GVariant
APIs might be relying on something like pointer identity of GVariantType
or something. But yeah, it doesn't look like that's the case. And in the end we're not returning a GVariant
, so data transfer is just "one way" here.
(Just two quick comments...will look at this more today) The other thing that occurs to me here is I think there's an argument that GVariant should only be used for metadata. Basically have [metadata][payload] and the total size of the payload is in metadata. Which is how it works today for the content objects in ostree. But...that'd be a format change for deltas with the concomitant versioning issues. |
The gvariant wire format depends on knowing the total size of the container, so you cannot append arbitrary stuff after it and still work, without also adding some kind of header size field external to the gvariant. |
btw, i think the second patch in this series that adds the finish_part() code could probably go in (with minor changes) even without the streaming writes. It will be less of a gain, but still a gain. |
Yes, but for content objects (and we could make this true for delta parts), total size = stbuf.st_size - header. |
I haven't looked closely, but I have a feeling it wouldn't be hard to add this to GLib directly if rather than using |
Anyways I'm OK to try merging this, but before I do I'm playing with |
Also, for that typeinfo hashtable, i copied the code as is. But it spends a lot of work building up some kind of offset table for quickly looking up members by offset, in generate_table(). If we only write stuff that code could be deleted (as well as some parts of GVariantMemberInfo. I kept it there for now as I was thinking of possibly using this on the reader side when applying the static delta in streaming fashion. |
@cgwalters if you mmap you run into issues on 32bit writing files > 2 GB. |
Yeah...but deltas were designed so that you'd generally make them on a "real" build server (they're not arch dependent). That said I understand for making bundles one wants it to work on "small" hardware too. Anyways I tracked down the deltas issue I hit when testing this which wasn't related: #1312 |
OK, I verified this passes my FAH test case too with my 2 other PRs applied. |
This allows us to create the final delta desciptor directly on disk rather than having it all in memory. This is nice because it can become quite large if inlined parts are used. Note however, that we currently generate all the delta parts in memory before adding them to the delta, so we still keep all individual parts in memory. Fixing that is the next step. Closes: #1309 Approved by: cgwalters
We will do some changes later that need these earliers, so move them up. Closes: #1309 Approved by: cgwalters
Directly when we allocate a new part we finish the old one, writing the compressed data to a temporary file and generating the delta header for it. When all these are done we loop over them and collect the headers, sizes and either copy the tempfile data into the inlined superblock or link the tempfiles to disk with the proper names. Closes: #1309 Approved by: cgwalters
☀️ Test successful - status-atomicjenkins |
This adds a new OtVariantBuilder helper that supports creating potentially large GVariants by directly writing them to a file descriptor, rather than building them up in memory. It contains some copied private GVariant code from glib which is used as the basis for the builder.
It then uses this builder to write the delta superblock. Additionally it changes the way the delta parts are written so that we don't have to keep all the deltas in memory, but can write them to a temporary file as they are done. For inlined parts these temporary files are then copied directly into the superblock fd.
This drops the requirement for generating static deltas a lot. In a test creating a (non-inlined) static delta for the gnome sdk went from 1.5 gig virt/1g res to 600 meg virt/300m res.