Skip to content
This repository has been archived by the owner on Feb 7, 2024. It is now read-only.

Create tarballs with contents in deterministic order #210

Closed
mark-kubacki opened this issue Nov 11, 2016 · 5 comments
Closed

Create tarballs with contents in deterministic order #210

mark-kubacki opened this issue Nov 11, 2016 · 5 comments

Comments

@mark-kubacki
Copy link
Contributor

mark-kubacki commented Nov 11, 2016

Contents of the aci/tar file are currently not ordered in any way. Yet that is needed for deterministic builds.

Please use tar --sort=name (available in tar version ≥1.28) when creating the target image.aci.

Or, even better: Please expose parameter lists for tar and gnupg so an user can add flags of their choice.

# ~/.config/dgr/config.yaml
rkt:
  …
tar:
  extra_params: ['--sort=name', '-J']
gpg:
  extra_params: […]
…

You'd need deterministic builds for:

  1. (trust) Reproduce building of an image. You'd need tar's --clamp-mtime for that, too.
  2. (auditing, trust, certification) Idempotently built base images. For example, given an image blitznote.com/ubuntu:16.04 and a Git repo dgr-ubuntu at a fixed commit, using said image and the build commands the image can be used to build itself to proof that nothing has been changed posteriori. (This limits the scope of an audit to a few scripts.)
  3. (size, download) Leverage de-duplication of the storage backend, and tools such as zsync to only download differences between an old image and a recently updated one.
@n0rad
Copy link
Member

n0rad commented Nov 11, 2016

I'm pretty sure you cannot reproduce the exact same aci by building it twice.

  • files dates will be different
  • packages DB will have different dates
  • Software conf files written by packages often have generated date in comments right into them
  • apt do not allow you to install specific packages you want, but last stable only

Of course it depend on the package manager you are using, but none I'm aware of, are designed to create exactly reproductible install (bytes per bytes).

It's too bad, but I don't think it's a big issue, the goal is to build an immutable aci you will store, share and reuse. Rebuilding it exactly the same way is useless in this case.

Also you have to notice that dgr is designed to make the build reproductible at the layer it manage. Meaning that except what you are doing inside build and builder that will access external ressources that may change, everything handled by dgr will be reproductible. Nothing on the host affect the build since the build itself is also running inside a container.

@n0rad
Copy link
Member

n0rad commented Nov 11, 2016

Your point 3 is weird, download of aci is directly handled by rkt that does not support such a thing.

using dependencies will reduce the size of the aci you are building.
For example on archlinux, I'm building an aci-arch-minimal that I'm using as a dependency on pretty much all other aci to have only the software layer I want in the final ACI.

@mark-kubacki
Copy link
Contributor Author

mark-kubacki commented Nov 11, 2016

Thank you for looking into this!

The issue is really independent from Ubuntu or Debian, though you can create such builds using them.

  • file dates can be modified. For example, the aforementioned --clamp-mtime does that.
  • You can install specific versions with apt like this, for example:
    apt -y -t yakkety install apt=1.3.1,
    or using emerge from Gentoo like this:
    emerge =dev-lang/python-2.7.9 and so on.

If dgr created a tarball with reproducible order, independent from locale and system (as by the parameter suggested above), I can easily catch any remaining differences.

My point (3) has been written with a remote storage in mind (think: webserver serving the images) – not any particular local storage, such as rkt's.

Regarding your description of building minimal layers: I don't see the connection to ordered files in ACIs. On the matter of minimal file sites you might indeed want to take a look on how I arrived at a minimal Ubuntu image for Docker or examine how I build most of my other Docker images for that matter. ;-)

This issue is strictly about two things:

  1. ordered files in the aci/tar
  2. optionally, user-defined parameters to tar

@n0rad
Copy link
Member

n0rad commented Nov 12, 2016

I'm not against doing it, it should not be a lot a work on our side.
I just want to point out that I think it's not necessary

@mark-kubacki
Copy link
Contributor Author

Thanks! I could fork and modify dgr for that, but I think it's small enough to not warrant that overhead, and would benefit more users than me.

mark-kubacki added a commit to mark-kubacki/blablacar-dgr that referenced this issue Mar 5, 2017
If the manifest is the first file then tools do not need to download
the whole image file in order to detect changes or updates.

Sorted contents of ACIs allow for easier comparison, and usage of
tools such as zsync, and deduplication on the server. The price,
sorting by 'tar', is cheap.

Parameter 'f' for 'tar' must be followed by the filename for recent
versions of 'tar'. Mind the order!

closes blablacar#210
mark-kubacki added a commit to mark-kubacki/blablacar-dgr that referenced this issue Mar 5, 2017
If the manifest is the first file then tools do not need to download
the whole image file in order to detect changes or updates.

Sorted contents of ACIs allow for easier comparison, and usage of
tools such as zsync, and deduplication on the server. The price,
sorting by 'tar', is cheap.

Parameter 'f' for 'tar' must be followed by the filename for recent
versions of 'tar'. Mind the order!

closes blablacar#210
mark-kubacki added a commit to mark-kubacki/blablacar-dgr that referenced this issue Mar 13, 2017
If the manifest is the first file then tools do not need to download
the whole image file in order to detect changes or updates.

Sorted contents of ACIs allow for easier comparison, and usage of
tools such as zsync, and deduplication on the server. The price,
sorting by 'tar', is cheap.

Parameter 'f' for 'tar' must be followed by the filename for recent
versions of 'tar'. Mind the order!

closes blablacar#210
mark-kubacki added a commit to mark-kubacki/blablacar-dgr that referenced this issue Mar 14, 2017
If the manifest is the first file then tools do not need to download
the whole image file in order to detect changes or updates.

Sorted contents of ACIs allow for easier comparison, and usage of
tools such as zsync, and deduplication on the server. The price,
sorting by 'tar', is cheap.

Parameter 'f' for 'tar' must be followed by the filename for recent
versions of 'tar'. Mind the order!

closes blablacar#210
mark-kubacki added a commit to mark-kubacki/blablacar-dgr that referenced this issue Mar 14, 2017
If the manifest is the first file then tools do not need to download
the whole image file in order to detect changes or updates.

Sorted contents of ACIs allow for easier comparison, and usage of
tools such as zsync, and deduplication on the server. The price,
sorting by 'tar', is cheap.

Parameter 'f' for 'tar' must be followed by the filename for recent
versions of 'tar'. Mind the order!

closes blablacar#210
mark-kubacki added a commit to mark-kubacki/blablacar-dgr that referenced this issue Mar 27, 2017
Enables dgr to work on hosts that don't have any tar. (See also blablacar#217.)

Sorts the contents to have a reproducible order, but pulls the manifest
file to the front for fast access.

Sorted contents of ACIs allow for easier comparison, and usage of
tools such as zsync, and deduplication on the server. The price,
sorting by 'tar', is cheap.

zap the "chdir-acrobatique", use `tar -C`:
    dgr changes paths and performs needless renames, which result in mayhem
if the process quits prematurely or the timing were off. The solution is to
use tar's `-C` param and transform the filenames.

closes blablacar#210
mark-kubacki added a commit to mark-kubacki/blablacar-dgr that referenced this issue Mar 27, 2017
Enables dgr to work on hosts that don't have any tar. (See also blablacar#217.)

Sorts the contents to have a reproducible order, but pulls the manifest
file to the front for fast access.

Sorted contents of ACIs allow for easier comparison, and usage of
tools such as zsync, and deduplication on the server. The price,
sorting by 'tar', is cheap.

zap the "chdir-acrobatique", use `tar -C`:
    dgr changes paths and performs needless renames, which result in mayhem
if the process quits prematurely or the timing were off. The solution is to
use tar's `-C` param and transform the filenames.

closes blablacar#210
mark-kubacki added a commit to mark-kubacki/blablacar-dgr that referenced this issue Mar 27, 2017
Enables dgr to work on hosts that don't have any tar. (See also blablacar#217.)

Sorts the contents to have a reproducible order, but pulls the manifest
file to the front for fast access.

Sorted contents of ACIs allow for easier comparison, and usage of
tools such as zsync, and deduplication on the server. The price,
sorting by 'tar', is cheap.

zap the "chdir-acrobatique", use `tar -C`:
    dgr changes paths and performs needless renames, which result in mayhem
if the process quits prematurely or the timing were off. The solution is to
use tar's `-C` param and transform the filenames.

closes blablacar#210
mark-kubacki added a commit to mark-kubacki/blablacar-dgr that referenced this issue Mar 27, 2017
Enables dgr to work on hosts that don't have any tar. (See also blablacar#217.)

Sorts the contents to have a reproducible order, but pulls the manifest
file to the front for fast access.

Sorted contents of ACIs allow for easier comparison, and usage of
tools such as zsync, and deduplication on the server. The price,
sorting by 'tar', is cheap.

zap the "chdir-acrobatique", use `tar -C`:
    dgr changes paths and performs needless renames, which result in mayhem
if the process quits prematurely or the timing were off. The solution is to
use tar's `-C` param and transform the filenames.

closes blablacar#210
@n0rad n0rad closed this as completed in 222aeb5 May 9, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

2 participants