Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ostree: new transport #228

Merged
merged 1 commit into from
Apr 21, 2017
Merged

Conversation

giuseppe
Copy link
Member

skopeo copy docker://busybox ostree:foo:latest@/ostree/repo

Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

From a very quick first pass, this looks pretty good. Figuring out what is the canonical reference form and the namespace hierarchy, if any, is the major outstanding design work. Other than that, just a handful of small nits.

}

func (d *ostreeImageDestination) PutBlob(stream io.Reader, inputInfo types.BlobInfo) (types.BlobInfo, error) {
hash := strings.Replace(inputInfo.Digest.Hex(), "sha256:", "", 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn’t there a native method to get the hex value without the algorithm ID? And this does need a check that the hash actually uses SHA256.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that the input digest may be nil. An initial implementation may just fail in that case, fully general would be to create a temporary file, and use the computedDigest value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn’t there a native method to get the hex value without the algorithm ID?

That’ method is .Hex() already, actually. So, that strings.Replace() should never be needed AFAICT; if it is, something is going wrong elsewhere.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I’m sorry, this has been addressed already.)

return types.BlobInfo{}, err
}

d.blobs[hash] = &blobToImport{Size : size, Digest : inputInfo.Digest, BlobPath : blobPath, Hash : hash}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use computedDigest instead of inputInfo.Digest, which may be unknown?

(Also, this mentions a digest of the same object three times, is that truly necessary?)

return exec.Command("ostree", "commit",
fmt.Sprintf("--add-metadata-string=docker.size=%d", blob.Size),
fmt.Sprintf("--branch=%s", ostreeBranch),
fmt.Sprintf("--tree=dir=%s", destinationPath)).Run()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just to be sure, is the untarred filesystem at destinationPath supposed to exist after this command returns, or is it supposed to be temporary?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it is deleted in .Commit. A comment to that effect here would be nice.

hash := strings.Replace(layer.Digest.Hex(), "sha256:", "", 1)
blob := d.blobs[hash]
if blob == nil {
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silently ignoring failures seems unsafe

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment here, if blob is not in d.blobs then it means the layer was already present and we don't need to import it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn’t any check to ensure that the layer was already present. It might also just not have been uploaded, perhaps because the caller is doing it wrong.

But, ultimately, the caller is free to upload a broken image and does not have to be told that it is incomplete, so, this works.

if blob == nil {
continue
}
err := d.importBlob(true, blob)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two paths in importBlob, based on isTar, are so different, and each is called exactly once with a constant value of isTar; it might be cleaner to just have two functions (importLayer, importConfig?).

// GetManifest returns the image's manifest along with its MIME type (which may be empty when it can't be determined but the manifest is available).
// It may use a remote (= slow) service.
func (s *ostreeImageSource) GetManifest() ([]byte, string, error) {
return nil, "", errors.Errorf(`Getting manifest not supported by "ostree:"`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this, it might be simpler to just fail in NewImageSource.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could get rid of this altogether I guess (the whole ostree src)

} else {
image, repo = s[0], s[1]
}
return ostreeReference{ref: path, repo: repo, image: image}, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conventionally we use NewReference for calls from Go code which knows the native values, passed as separate parameters, and ParseReference for calls which get a single user-specified string input and parse it into the native Go values.

} else {
image, repo = s[0], s[1]
}
return ostreeReference{ref: path, repo: repo, image: image}, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be fine (and perhaps preferable) to only store the real values of the components of the reference (repo, image) and not the original string.

// not required/guaranteed that it will be a valid input to Transport().ParseReference().
// Returns "" if configuration identities for these references are not supported.
func (ref ostreeReference) PolicyConfigurationIdentity() string {
return ref.ref
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PolicyConfigurationIdentity really should be in a “canonical” form, and fully explicit (in particular dealing with the @/ostree/repo defaulting.

}
path = path[:lastSlash]
res = append(res, path)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not really make sense; for whatever@/my/repo this would return whatever@/my, whatever@.

Designing the hierarchy here so that it makes sense for uses is basically the major outstanding action item for ostreeReference/ostreeTransport. (It would be possible to not have any parent namespaces, only a PolicyConfigurationIdentity.)

@giuseppe
Copy link
Member Author

I added a fixup comment ⬆️ to address all your comments. PutBlob handles also the case when the input digest is nil

@giuseppe giuseppe changed the title [RFC] ostree: new transport ostree: new transport Jan 29, 2017
@giuseppe
Copy link
Member Author

giuseppe commented Feb 3, 2017

@mtrmac @runcom is there any other change to address before it could be merged?

I am not 100% happy with using the cli of OSTree, I'll probably rework this in future to use the golang bindings (the golang bindings are not yet stable AFAIK)

BlobPath string
}

type descriptor struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're duplicating this too many. Worth using the official image-spec descriptor type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new commit to export Descriptor and use it

// GetManifest returns the image's manifest along with its MIME type (which may be empty when it can't be determined but the manifest is available).
// It may use a remote (= slow) service.
func (s *ostreeImageSource) GetManifest() ([]byte, string, error) {
return nil, "", errors.Errorf(`Getting manifest not supported by "ostree:"`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could get rid of this altogether I guess (the whole ostree src)

@@ -28,33 +28,33 @@ var gzippedEmptyLayer = []byte{
// gzippedEmptyLayerDigest is a digest of gzippedEmptyLayer
const gzippedEmptyLayerDigest = digest.Digest("sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4")

type descriptor struct {
type Descriptor struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are different descriptor schemas, and more importantly semantics / content expectations, in various formats (e.g. OCI expects different MediaType values). Name this perhaps Schema2Descriptor, please, to make it clear this is format-specific.

MediaType string `json:"mediaType"`
Size int64 `json:"size"`
Digest digest.Digest `json:"digest"`
URLs []string `json:"urls,omitempty"`
}

type manifestSchema2 struct {
type ManifestSchema2 struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not thrilled about exporting the entire implementation of the private genericManifest interface. Could this type be split into public a pure JSON-marhsallable data structure with no methods, and a private type implementing genericManifest, probably embedding that pure data structure?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't fully looked at the code, but why do we need this type to be exported at all? is it because ostree pkg needs it? we could move this under manifest/ I guess

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@runcom yes, I am exporting this type so that it could be used by OSTree, as it needs to parse the manifest to get the config and the layers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, we'll, maybe we really need to export those in the manifest pkg (not as part of this PR). @mtrmac wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@runcom is it then fine if I drop the second patch from this PR and leave the OSTree refactoring for later?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a single schema2 data structure in the manifest package, used by all of docker/daemon, docker_schema1.go, docker_schema2.go, and the autodetection in manifest/manifest.go, does sound great.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This can be a separate PR, but if we are doing this, let’s actually do that instead of making the image/docker_schema2.go version public now.)

}

func (d *ostreeImageDestination) PutBlob(stream io.Reader, inputInfo types.BlobInfo) (types.BlobInfo, error) {
hash := strings.Replace(inputInfo.Digest.Hex(), "sha256:", "", 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I’m sorry, this has been addressed already.)

manifestPath := filepath.Join(encodeOStreeRef(d.ref.image.String()), "manifest")
branch := encodeOStreeRef(d.ref.image.String())
err := exec.Command("ostree", "commit",
fmt.Sprintf("--add-metadata-string=docker.manifest=%s", string(d.manifest)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d.manifest is already a string, no need to cast it here. (Or perhaps make d.manifest a []byte, which is more natural, and keep the cast here. Either works.

hash := strings.Replace(layer.Digest.Hex(), "sha256:", "", 1)
blob := d.blobs[hash]
if blob == nil {
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn’t any check to ensure that the layer was already present. It might also just not have been uploaded, perhaps because the caller is doing it wrong.

But, ultimately, the caller is free to upload a broken image and does not have to be told that it is incomplete, so, this works.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. This is a fairly detailed look, i.e. hopefully no big surprises outstanding.

The reference naming and policy identity, in particular the treatment of .repo, remain the major outstanding point.

// nil requestedManifestMIMETypes means manifest.DefaultRequestedManifestMIMETypes.
// The caller must call .Close() on the returned ImageSource.
func (ref ostreeReference) NewImageSource(ctx *types.SystemContext, requestedManifestMIMETypes []string) (types.ImageSource, error) {
return nil, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return nil, errors.New("Reading ostree: images is currently not supported")

or something like that, please, to keep the contract. Silently returning nil would cause the caller to crash.


type ostreeTransport struct{}

type ostreeImageSource struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now unused.

// ostreeReference is an ImageReference for ostree paths.
type ostreeReference struct {
ref string
repo string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS repo is only used in PolicyConfigurationIdentity, in particular it does not affect the filesystem paths and it is not passed to the ostree subprocesses at all. What is it supposed to affect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the catch, repo must be passed to each invocation of ostree, as --repo=$REPO


image, err := reference.ParseNamed(imageStr)
imageTagged := reference.WithDefaultTag(image)
if err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this check one line up.

// (fully explicit, i.e. !reference.IsNameOnly, but reflecting user intent,
// not e.g. after redirect or alias processing), or nil if unknown/not applicable.
func (ref ostreeReference) DockerReference() reference.Named {
return nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return ref.image? That would allow creating signatures when copying to ostree:.


func (d *ostreeImageDestination) Commit() error {
defer func() {
os.RemoveAll(encodeOStreeRef(d.ref.image.String()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, all of encodeOStreeRef(d.ref.image.String()) is a temporary directory until commit. (Abstracting from its location), shouldn’t it actually be deleted in ostreeImageDestination.Close, so that the data does not lay around if the caller aborts and never calls Commit?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still outstanding AFAICS

if err != nil {
return err
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, is there a benefit to deferring the import to this point at all? PutBlob could store the data directly, perhaps without an on-disk temporary copy of the tarball sized in hundreds of megabytes. I suppose the difficulty would be in differentiating layer tarballs and the config?

This does work fine, just a suggestion to consider whether that makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I was storing the file immediately before, but then I changed to do it here as we know what must be stored as a tarball and what instead in the metadata. We could try to detect if the file is a .tar in the PutBlob function, but I think doing it here is cleaner (at the expense of keeping them)

@@ -0,0 +1,171 @@
package ostree
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Due to the impact on signatures we generally want as close to 100% code coverage for the ImageTransport and ImageReference implementations as possible, but that will obviously wait until the formats and semantics are nailed down—and I can write that.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still outstanding. I’ll take a stab at this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… not yet, this is blocked on the repo/image hierarchy and naming decision.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a new version. I changed the repo/image hierarchy separator to be ':' as I found '@' to be confusing (we already use @ for "image@repo")

}

// layerPath returns a path for a layer tarball within a ostree using our conventions.
func (ref ostreeReference) layerPath(digest digest.Digest) string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unused.


var ostreeRefRegexp = regexp.MustCompile(`^[A-Za-z0-9.-]$`)

func encodeOStreeRef(in string) string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always called as encodeOStreeRef(ref.image.String()). It seems that this should be a method on ImageReference, like e.g. manifestPath below.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Making encodeOSTreeRef a method the way manifestPath would be more consistent, but this does work fine.)

@@ -131,6 +131,7 @@ func (d *ostreeImageDestination) importBlob(blob *blobToImport) (error) {
return err
}
return exec.Command("ostree", "commit",
fmt.Sprintf("--repo=%s", d.ref.repo),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW ostree seems to also accept separate parameters without the =e.g. […, "--repo, d.ref.repo, …]. That would allow getting rid of all of these fmt.Sprintf` calls.

}

func (d *ostreeImageDestination) PutSignatures(signatures [][]byte) error {
if err := ensureParentDirectoryExists(d.ref.signaturePath(0)); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know very little about ostree; is there anywhere I can read up on this issue?

What you seem to be saying is that at the time of storing an image into the ostree repo, the writer must use the “correct” SELinux types, but the writer does not know where the reader will check out the image, .e. the writer can not know what the “correct” types are?! If so, how is atomic better positioned to know this?

@giuseppe
Copy link
Member Author

giuseppe commented Feb 6, 2017

atomic knows where the files will be checked out since checkout_path can be overriden in the atomic.conf file.

When we commit a directory to OSTree, OSTree stores each file by its checksum, the checksum includes also the xattrs of the file, this is why we need to store it with the correct SELinux labels. When we extract the .tar of a layer, we need to do that to a directory where we know it will get the right SELinux labels.

We could use /var/lib/containers/atomic for the temporary directory (as this directory is used in containers-selinux) fo files to get the right labels, but I'd prefer that we are not going to hardcode this path in containers/image as well: 1) the atomic tool will still be the owner of that directory and its subdirectories 2) keep the freedom to change it (if necessary) from atomic and containers-selinux only.

@rhatdan what do you think of this?

Using the current working directory seemed like a good compromise to me, otherwise, should we introduce an environment variable to decide where to keep temporary files?

@rhatdan
Copy link
Member

rhatdan commented Feb 6, 2017

I think we might be conflating two different concepts here. OSTRee used for the atomic host so that paths like /usr/bin/docker is labeled container_runtime_t, versus paths in a container for a system container, which probably have no label associated with them. Their is nothing in an OCI image to identify the label of a file path. Pretty much we treat the entire content of a container as a single label
usually something that looks like system_u:object_r:container_file_t:s0:c1,c2.

In the case of system containers it will just be the default label of /var/lib/containers.

@giuseppe
Copy link
Member Author

giuseppe commented Feb 6, 2017

@rhatdan yes, exactly it will be the label of /var/lib/containers. The decision here to take is to leave to atomic the goal of creating the correct temporary directory and run skopeo from there, or manage the temporary directory from containers/image itself.

non root users can also pull images to their OSTree repository, not shared with the system. So we will have to handle this case as well, as these users have no write access to /var/lib/containers. There is already the logic in atomic to do this, I'd prefer to leave it there and run skopeo in the correct location instead of duplicating and maintaining this logic here as well.

@mtrmac is fine to keep creating the tmp files in the current working directory or is it a blocker? Would an environment variable work fine?

@rhatdan
Copy link
Member

rhatdan commented Feb 6, 2017

I think the higher level concepts should stay in atomic command and just use PWD for tmp files.

@mtrmac
Copy link
Collaborator

mtrmac commented Feb 6, 2017

The decision here to take is to leave to atomic the goal of creating the correct temporary directory and run skopeo from there, or manage the temporary directory from containers/image itself.

To step back here, I think containers/image should ideally work stand-alone without the caller having to care about implementation details of the individual transports.

atomic very likely won’t remain the only caller forever: e.g. if we ever set up cri-o to use ostree, we will need this to work without running an atomic command, and it would be nice if cri-o didn’t have to know anything about the ostree implementation details. And it is pretty much a non-starter for a long-running daemon like cri-o to modify per-process (not even per-thread!) global state like the current working directory or environment variables.

If the only cost of this were having to parse atomic.conf in ostreeImageDestination, that seems very much worth it to me.


Using the current working directory seemed like a good compromise to me,

This can fail pretty horribly in general; it does not even have to be writable! It may be on a tmpfs and cause the machine run out of memory.

otherwise, should we introduce an environment variable to decide where to keep temporary files?

We have the types.SystemContext mechanism for passing transport-specific options and overrides, and that would be more suitable than per-process environment variables.


My preference would be (but I may very well be misunderstanding the problem) to get the temporary directory path by:

  1. (Optionally) first checking types.SystemContext.OSTreeTemporaryPath, if it is defined. (Or should that be OSTreeSELinuxContext, to be explicit about what we care about instead of indirectly expressing this as paths?)
  2. Otherwise, if defined and available, parse atomic.conf for the checkout_path value.
  3. Otherwise use a hard-coded default.

OTOH if you want to keep the current behavior, extracting directly into a temporary directory, I can live with this being an atomic-specific hack slightly less horrible than skopeo layers extracting data into the current directory, which is also an atomic-specific hack; at least we would get the signature support, and hopefully we can improve on this later.

@mtrmac
Copy link
Collaborator

mtrmac commented Feb 6, 2017

OTOH if you want to keep the current behavior, extracting directly into a temporary directory, I can live with this being an atomic-specific hack slightly less horrible than skopeo layers extracting data into the current directory, which is also an atomic-specific hack; at least we would get the signature support, and hopefully we can improve on this later.

This would, at the very least, need a very explicit documentation about this behavior and the expectations on the caller somewhere.

@giuseppe
Copy link
Member Author

giuseppe commented Feb 7, 2017

@mtrmac, I agree that using the CWD probably is not the best thing to do when using containers/image as a library. I was looking at it only as a replacement for the current skopeo layers. I am fine to change this, just one question, in atomic we will still use Skopeo as a CLI tool, not like a library.

How to specify the backend specific data (types.SystemContext) when using skopeo copy?

@mtrmac
Copy link
Collaborator

mtrmac commented Feb 7, 2017

For skopeo copy it would be a new command-line option, the way SignBy or RemoveSignatures are now.

(As I said above, I can live with using the current working directory, it is still an improvement; just not nearly as big as I’d like.)

@giuseppe giuseppe force-pushed the ostree-transport branch 3 times, most recently from bf00efd to 5c36998 Compare February 7, 2017 16:26
@giuseppe
Copy link
Member Author

giuseppe commented Feb 7, 2017

I added `tmpDirPath' to SystemContext and I use this directory for the temporary files

@giuseppe
Copy link
Member Author

@mtrmac @runcom is this fine to go? I'll start working on the Skopeo changes as soon as it is merged

@giuseppe giuseppe force-pushed the ostree-transport branch 4 times, most recently from e79eb78 to f4c54a4 Compare February 10, 2017 16:58
@giuseppe
Copy link
Member Author

@mtrmac I've cherry picked your commit and done some changes to it

@giuseppe
Copy link
Member Author

I rebased the patch on top of origin/master, but I am seeing some regressions in the Makefile, I opened a separate PR:

#242

@giuseppe
Copy link
Member Author

giuseppe commented Mar 2, 2017

@mitr is the last version better?

I've opened a WIP PR for Skopeo to add the other bits there:

containers/skopeo#314

@giuseppe
Copy link
Member Author

giuseppe commented Mar 7, 2017

ping

@giuseppe giuseppe force-pushed the ostree-transport branch 2 times, most recently from 7318e40 to eeb9d37 Compare March 14, 2017 14:13
@giuseppe
Copy link
Member Author

any update on this? @runcom @rhatdan @mtrmac

@rhatdan
Copy link
Member

rhatdan commented Mar 21, 2017

@runcom @mtrmac Can we get this reviewed again?

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 8, 2017

(Notes to self, please ignore.

If I am reading Atomic/syscontainers.py:SystemContainers.{_encode_to_ostree_ref,_convert_to_skopeo} correctly, atomic is currently treating the various Docker-equivalent busybox strings as equivalent for source pulling, and therefore signature verification (not explicitly, but because skopeo… docker://does), but distinct for destination ostree storage (except for adding a default :latest tag). Is there a semantic commitment or merely an UI which mixes two namespaces into a single string?

Also, _encode_to_ostree_ref seems to care about registries but it does not, really, in an observable way; it splits the registry part of a string, if any, but then reconstitutes it into the original form.

What/who ultimately “owns” the OStree ociimage/ (syscontainers.py:OSTREE_OCIIMAGE_PREFIX) namespace and defines the rules?)

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 10, 2017

for signing we should use the same semantics of Docker, since that is the semantic we follow for pulling an image anyway. The extra code that caused confusion was related only to how the image is stored in OSTree itself. This part might change in future and work as Docker, but we will need to sync with atomic so for now I've left the same logic of how "atomic pull --storage ostree" works today. I hope the new version is cleaner as I've moved the OSTree specific bits to their own function.

I can’t quite wrap my head around this, the way I understand it. It’s not really possible for the busybox and docker.io/library/busybox:latest strings to be equivalent for purposes of signature verification, but distinct for purposes of image storage / referring to images locally. If the images are different then their signatures should not be substituable.


@giuseppe Reading through syscontainers.py and the surrounding ecosystem, my best guess at the intended interpretation is that for ostree OCI, the references are (a sequence of /-separated fragments, with a mandatory :tag trailer) mapped using _$hex$hex escape mechanism into a single libOSTree fragment (i.e. there are no further OSTree-visible / separators).

Within the above, the only normalization / equivalence classes are formed by the implied :latest suffix. Is that correct?

Also, the namespace used here is intentionally using this encoding and hiding it from the user, i.e. there is no desire to expose all libOSTree names (e.g. those without the ociimages/ prefix in branch name, or those not using the _$hex$hex escaping). Presumably only images using the ociimages/$encoded form are expected to contain the required docker.manifest etc. annotations, so exposing the full libOSTree namespace would not actually be desirable. Is that correct?

Within this context, while (atomic pull $foo) uses $foo for both pulling from registries (using docker/distribution canonicalization, with either the docker.io defaulting, or the atomic repository search semantics) and into libOSTree ociimages/$encoded (using no repository canonicalization, only the :latest default tag), my best guess is that this should be treated as an UI matter (providing a convenient(?) shortcut for the user using a single string for both services), not as a claim that the two services use an equivalent namespace with equivalent rules. That’s at least a possible way to solve the “is equivalent/is not equivalent, at the same time” conundrum. Is that a reasonable way to think about this, or is the intent different?

If so, I can try updating the code in that direction.


[Not getting at all into the details of how long names are allowed to be and what are the exact allowed characters in various components and so on; given things like if len(branch_id) == 64 when interpreting existing branches, while not caring about that at all when creating new branches, it’s just not worth it to try to get a precise spec before this PR is finalized. Let’s just do something and either expanding or restricting these details can happen later as bug reports, inevitably, come in.]

@giuseppe
Copy link
Member Author

Yes exactly, we don't want to expose all the OSTree branches as references that are not under the 'ociimage/' namespace. Images that are not ociimage/ are neither OCI layers nor storing the manifest metadata which is needed to recreate the image from its layers. We must handle only the references that are under 'ociimage/'.

Under ociimage/ we maintain both the metadata for the images (where we just store the manifest info), and the OCI layers that are in the form ociimage/$checksum. The mangling used on the image names is needed to circumvent a limitation in the branch names allowed by OSTree).

We actually store and use the full name of the image because we want to be able to use system containers also when the local Docker engine is not running, so we are not using the atomic search semantic. We can change this in future, as long as we keep the ability of running without requiring Docker.

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 11, 2017

@giuseppe Please take a look at https://github.com/mtrmac/image/tree/ostree-not-dockerlike (built on top of your work, rebased. See the commit messages of the individual commits for more details on what/why; then they can all be squashed.

Does this make sense, and correspond to what you want the semantics to be?

(Warning: Completely untested apart from tests passing.)

@giuseppe
Copy link
Member Author

@mtrmac thanks for the extra patches! I've tested them with Skopeo and everything seems to work fine. I've only added one patch to correctly register the OSTree transport.

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 12, 2017

I've only added one patch to correctly register the OSTree transport.

What the… I did notice that you have added the registration in your latest branch, and I remember quite strongly starting the work in that branch (and the presence of ostreeBranchForImage does suggest it’s the latest variant or something very close). I’m afraid I have deleted my WIP branches already, so I have absolutely no idea what I have done to screw this up. Anyway, if this works for you, great.

@runcom PTAL.

@runcom
Copy link
Member

runcom commented Apr 20, 2017

LGTM

Approved with PullApprove

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 20, 2017

🎉 @giuseppe Please squash the commits and let’s merge this.

@giuseppe
Copy link
Member Author

@mtrmac sure, do you prefer a single commit or any other preference?

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 20, 2017

One commit would work just fine; I don’t know whether there is any other split you may wish to preserve.

skopeo copy docker://busybox ostree:foo:latest@/ostree/repo

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Member Author

I've squashed all the commits into a single one ⬆️

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 21, 2017

👍

Approved with PullApprove

@runcom
Copy link
Member

runcom commented Apr 21, 2017

needs rebase already :(

@mtrmac mtrmac merged commit 522fb43 into containers:master Apr 21, 2017
@rhatdan
Copy link
Member

rhatdan commented Apr 21, 2017

Awesome work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants