Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming static deltas #1309

Closed

Conversation

alexlarsson
Copy link
Member

This adds a new OtVariantBuilder helper that supports creating potentially large GVariants by directly writing them to a file descriptor, rather than building them up in memory. It contains some copied private GVariant code from glib which is used as the basis for the builder.

It then uses this builder to write the delta superblock. Additionally it changes the way the delta parts are written so that we don't have to keep all the deltas in memory, but can write them to a temporary file as they are done. For inlined parts these temporary files are then copied directly into the superblock fd.

This drops the requirement for generating static deltas a lot. In a test creating a (non-inlined) static delta for the gnome sdk went from 1.5 gig virt/1g res to 600 meg virt/300m res.

This is similar to GVariantBuilder in that it constructs variant
containers, but it writes it directly to a file descriptor rather
than keep the entier thing in memory. This is useful to create large
variants without using a lot of memory.
This allows us to create the final delta desciptor directly on disk
rather than having it all in memory. This is nice because it can
become quite large if inlined parts are used.

Note however, that we currently generate all the delta parts in
memory before adding them to the delta, so we still keep all individual
parts in memory. Fixing that is the next step.
We will do some changes later that need these earliers, so move them up.
Directly when we allocate a new part we finish the old one,
writing the compressed data to a temporary file and generating
the delta header for it.

When all these are done we loop over them and collect the headers,
sizes and either copy the tempfile data into the inlined superblock
or link the tempfiles to disk with the proper names.
@alexlarsson
Copy link
Member Author

Note, the deltas produced were identical, sans the timestamp.

@alexlarsson
Copy link
Member Author

This will help flatpak/flatpak#483

@alexlarsson
Copy link
Member Author

Gah, it failed because of the permissions on the superblock. Honestly glnx_link_tmpfile_at() should probably take a mode argument to avoid issues like this in the future.

@alexlarsson
Copy link
Member Author

src/libotutil/ot-variant-builder.c:1028:23: error: unused variable 'keep_around_until_return' [-Werror,-Wunused-variable]
  g_autoptr(GVariant) keep_around_until_return = g_variant_ref_sink (value);

It is used, by unref:ing it when we go out of scope. @cgwalters Do you know how to fix this warning?

@alexlarsson
Copy link
Member Author

Finally it passed CI :)

};

static GRecMutex g_variant_type_info_lock;
static GHashTable *g_variant_type_info_table;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm...won't this possibly break if there are two copies of this in process? (BTW last I measured this type info table was part of what made GVariant slow)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesistant a bit to do this without at least opening up a GLib bug and tracking this API addition there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it break? Its jus a local static variable which protects a hashtable. It doesn't affect anything outside this file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the GVariant APIs might be relying on something like pointer identity of GVariantType or something. But yeah, it doesn't look like that's the case. And in the end we're not returning a GVariant, so data transfer is just "one way" here.

@cgwalters
Copy link
Member

cgwalters commented Oct 26, 2017

(Just two quick comments...will look at this more today)

The other thing that occurs to me here is I think there's an argument that GVariant should only be used for metadata. Basically have [metadata][payload] and the total size of the payload is in metadata. Which is how it works today for the content objects in ostree. But...that'd be a format change for deltas with the concomitant versioning issues.

@alexlarsson
Copy link
Member Author

The gvariant wire format depends on knowing the total size of the container, so you cannot append arbitrary stuff after it and still work, without also adding some kind of header size field external to the gvariant.

@alexlarsson
Copy link
Member Author

btw, i think the second patch in this series that adds the finish_part() code could probably go in (with minor changes) even without the streaming writes. It will be less of a gain, but still a gain.

@cgwalters
Copy link
Member

cgwalters commented Oct 26, 2017

The gvariant wire format depends on knowing the total size of the container, so you cannot append arbitrary stuff after it and still work, without also adding some kind of header size field external to the gvariant.

Yes, but for content objects (and we could make this true for delta parts), total size = stbuf.st_size - header.

@cgwalters
Copy link
Member

I haven't looked closely, but I have a feeling it wouldn't be hard to add this to GLib directly if rather than using write() we used mmap() on the fd, and fallocate() to extend it.

@cgwalters
Copy link
Member

Anyways I'm OK to try merging this, but before I do I'm playing with manual-tests/static-delta-generate-crosscheck.sh on some FAH content and seeing how things work. That unearthed a few bugs (not apparently related to this)

@alexlarsson
Copy link
Member Author

Also, for that typeinfo hashtable, i copied the code as is. But it spends a lot of work building up some kind of offset table for quickly looking up members by offset, in generate_table(). If we only write stuff that code could be deleted (as well as some parts of GVariantMemberInfo.

I kept it there for now as I was thinking of possibly using this on the reader side when applying the static delta in streaming fashion.

@alexlarsson
Copy link
Member Author

@cgwalters if you mmap you run into issues on 32bit writing files > 2 GB.

@cgwalters
Copy link
Member

Yeah...but deltas were designed so that you'd generally make them on a "real" build server (they're not arch dependent). That said I understand for making bundles one wants it to work on "small" hardware too.

Anyways I tracked down the deltas issue I hit when testing this which wasn't related: #1312

@cgwalters
Copy link
Member

OK, I verified this passes my FAH test case too with my 2 other PRs applied.

@rh-atomic-bot r+ b200880

@rh-atomic-bot
Copy link

⌛ Testing commit b200880 with merge de0e015...

rh-atomic-bot pushed a commit that referenced this pull request Oct 27, 2017
This allows us to create the final delta desciptor directly on disk
rather than having it all in memory. This is nice because it can
become quite large if inlined parts are used.

Note however, that we currently generate all the delta parts in
memory before adding them to the delta, so we still keep all individual
parts in memory. Fixing that is the next step.

Closes: #1309
Approved by: cgwalters
rh-atomic-bot pushed a commit that referenced this pull request Oct 27, 2017
We will do some changes later that need these earliers, so move them up.

Closes: #1309
Approved by: cgwalters
rh-atomic-bot pushed a commit that referenced this pull request Oct 27, 2017
Directly when we allocate a new part we finish the old one,
writing the compressed data to a temporary file and generating
the delta header for it.

When all these are done we loop over them and collect the headers,
sizes and either copy the tempfile data into the inlined superblock
or link the tempfiles to disk with the proper names.

Closes: #1309
Approved by: cgwalters
@rh-atomic-bot
Copy link

☀️ Test successful - status-atomicjenkins
Approved by: cgwalters
Pushing de0e015 to master...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants