Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add oci-archive transport that creates a tar archive of an image #314

Merged
merged 2 commits into from
Aug 31, 2017

Conversation

umohnani8
Copy link
Member

The oci-archive transport uses oci to create a directory of the image, which is then tarred up into a file. oci-archive will be used in kpod save and kpod load for oci images.
Signed-off-by: umohnani8 umohnani@redhat.com

@umohnani8
Copy link
Member Author

@rhatdan @mtrmac PTAL

index imgspecv1.Index
}

var ociImgDest types.ImageDestination
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be added to the struct above?

if err := tar(path); err != nil {
return err
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to remove the ociImg when you are done.

// temporary path of the tar file
tempPath := path + "1"
defer os.Rename(tempPath, path)
defer os.RemoveAll(path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THis should happen in the caller, not here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not leave it in tar? It tars up the directory, removes the directory and renames the tar file to what the directory was before.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be somewhat nice if the code allowed replacing an older tar file in the destination location atomically with a new image. It’s by no means required now, but it shouldn’t be impossible.

And even disregarding atomicity, it is at least somewhat plausible that the user has used exactly the same destination path in the past, and a file already exists there, so we can’t just create a directory in the same path, at least without removing the old contents.

Overall, sharing the path for both the file and directory, though kinda a neat trick, is just not worth the hassle AFAICT.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I call into a function with a file, I don't expect to have the file modified or changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also allows the caller to control the naming. I would rather have temporary names.

}

// tar converts the direstory at path to a tar file and deletes the directory
func tar(path string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should take an src,dest and then don't do the Rename stuff.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah; newImageDestinationArchive should create a temporary directory (grep for temporaryDirectoryForBigFiles, not in the default path), create an oci/layout.ociReference pointing to that reference, then the Commit implementation should just tar it up into the desired destination path, and Close delete the temporary directory.

}

// unTar unpacks a tar file into a directory and deletes the tar file
func unTar(path string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have src,dest

@@ -0,0 +1,126 @@
package layout
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a separate …/oci/archive subpackage for all of this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

She will have to export the interface from OCI if she does.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only as long as the code keeps relying on the internals of ociReference etc., which should not be necessary if the rest of the code converts to use temporary directories.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started out by packaging this separately, but I was having a hard time initializing ociReference. Since all the functions of oci-archive call the functions in oci, I need to be able to set ociReference and use it to call the oci functions. I was using Get and Set functions but was failing to get it work.
Things make more sense now, so I am going to work on separating it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oci/layout.NewReference should be all that is necessary AFAICT.

Then, perhaps, some utility subpackage to minimize the amount of copy&pasted parsing/formatting code shared between oci/layout and oci/archive, ValidatePolicyConfigurationScope and the like — depending on the details of how the two end up.

)

func init() {
transports.Register(ArchiveTransport)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The types.ImageTransport and types.ImageReference types, due to their relevance for signature enforcement, should have test coverage close to 100%. Perhaps roughly base it on docker/archive/transport_test.go, or I might be able to help with that later.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… but let’s wait with the tests until the design, and in particular the reference parsing, is designed and nailed down.

// ArchiveTransport is an ImageTransport for OCI archive
// it creates an oci-archive tar file by calling into the OCI transport
// tarring the directory created by oci and deleting the directory
var ArchiveTransport = ociArchiveTransport{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the oci/archive package, this should be named just Transport.


type ociArchiveTransport struct{}

var ociTr = ociTransport{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The types.ImageTransport implementations should generally be stateless singletons (containers-storage is the only exception so far). Just call oci/layout.Transport.$method directly if necessary.

type ociArchiveTransport struct{}

var ociTr = ociTransport{}
var ociRef types.ImageReference
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be a global variable, that would make it impossible to use multiple ociArchiveReference objects simultaneously. Looks like this should be a member of ociArchiveReference.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… but see the discussion in ImageSource/ImageDestination implementations; it rather seems that the ociReference lifetime should be tied to the lifetimes of the temporary directories, i.e. only members of ociArchiveImageDestination/ociArchiveImageSource

}

// tar converts the direstory at path to a tar file and deletes the directory
func tar(path string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah; newImageDestinationArchive should create a temporary directory (grep for temporaryDirectoryForBigFiles, not in the default path), create an oci/layout.ociReference pointing to that reference, then the Commit implementation should just tar it up into the desired destination path, and Close delete the temporary directory.

// temporary path of the tar file
tempPath := path + "1"
defer os.Rename(tempPath, path)
defer os.RemoveAll(path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be somewhat nice if the code allowed replacing an older tar file in the destination location atomically with a new image. It’s by no means required now, but it shouldn’t be impossible.

And even disregarding atomicity, it is at least somewhat plausible that the user has used exactly the same destination path in the past, and a file already exists there, so we can’t just create a directory in the same path, at least without removing the old contents.

Overall, sharing the path for both the file and directory, though kinda a neat trick, is just not worth the hassle AFAICT.

"github.com/pkg/errors"
)

var ociImgSrc types.ImageSource
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be a member of ociArchiveImageSource, not a global variable.


type ociArchiveImageDestination struct {
ref ociArchiveReference
index imgspecv1.Index
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index seems unused.

return err
}
path := ociRef.(ociReference).resolvedDir
if err := tar(path); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ociArchiveImageSource really must not modify the original file in any way, not even with the best intentions of replacing it later:

  1. In the meantime others expecting to use the original file at that path can't do so.
  2. The process may crash, leaving the original replaced by an unexpected directory forever.
  3. Nothing guarantees that untar+tar will reproduce the original byte-for-byte exactly.

Similarly to the discussion about ociArchiveImageDestination, newImageSourceArchive should create a temporary directory and extract the image there.

(Alternatively, it would be also possible to implement ociArchiveImageSource which does not need to extract the tarball at all, by just seeking around the original tarball, like docker/tarfile/Source.openTarComponent and callers do. Then oci/archive and oci/layout could share the format implementation, differing only in the backend used to access file components. But that would only work with seekable inputs; extracting the tarball and calling the layout implementation directly is perfectly fine.)


type ociArchiveImageSource struct {
ref ociArchiveReference
descriptor imgspecv1.Descriptor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

descriptor seems unused.

@umohnani8 umohnani8 force-pushed the oci_archive branch 2 times, most recently from bf4293e to 5c221e4 Compare July 28, 2017 15:35
@umohnani8
Copy link
Member Author

umohnani8 commented Jul 28, 2017

I separated oci-archive to a sub package called archive. Got rid of the global variables and added ociImageDestination and ociImageSource to the respective Archive structs.
I create a temp directory in /var/tmp called oci-image and pass this path to ociReference - ociArchiveReference stores the path given by the user so that the tar file is stored there at the end.
This temp directory is then deleted in Close().
@rhatdan @mtrmac PTAL.


// Reference returns the reference used to set up this destination.
func (d *ociArchiveImageDestination) Reference() types.ImageReference {
return d.ociImgDest.Reference()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should return d.ref. Same in the ImageSource implementation.

func (d *ociArchiveImageDestination) Close() error {
_, tmpDir, _ := layout.GetOciReference(d.ref.ociRef)
defer os.RemoveAll(tmpDir)
return d.ociImgDest.Close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a possible failure of os.RemoveAll should be reported to the user (though it arguably that whatever the user wanted to do did succeed); instead of defer, just call the two things in sequence and check errors. Same in the ImageSource implementation.

// tar converts the direstory at src and saves it to dst
func tar(src, dst string) error {
// input is a stream of bytes from the archive of the directory at path
input, err := archive.Tar(src, 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

archive.Uncompressed would be more readable than 0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtrmac could you explain this a bit more? What should it be if not 0?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The archive.Uncompressed typed constant. (The constant does the value 0, but it is typed and readable without checking the source code of the archive package)

}

// creates a temporary file
outFile, err := os.Create(dst)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dst is not a temporary path AFAICT.


type ociArchiveImageDestination struct {
ref ociArchiveReference
ociImgDest types.ImageDestination
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a bit more readable when named unpackedDest, unpacked, or something like that. Same in the ImageSource implementation.

}

// unTar unpacks the tar file at src and saves it at dst
func unTar(src, dst string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This, at 4 lines of code and 4 lines of function wrapping + 3 lines at the single call site, is small enough that it could be inlined into the caller; but having a separate function, for symmetry with tar, is also fine.)

// GetOciReference returns the dir, resolvedDir, and tag of ociReference
// so that it is accessible in oci/archive
func GetOciReference(t types.ImageReference) (string, string, string) {
return t.(ociReference).dir, t.(ociReference).resolvedDir, t.(ociReference).tag
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would just panic if the caller supplied a wrong type. But overall, after the use in archive.ParseReference is removed, this won’t be needed: when the caller creates a reference for the temporary directory, it can just remember the directory path. That wastes a tiny bit of memory but the code is easier to follow and clearly can’t panic.

if err != nil {
return nil, errors.Wrapf(err, "error parsing reference %q", reference)
}
dir, resolvedDir, tag := layout.GetOciReference(ociArchRef)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still passing a file path to a transport which expects a directory.

Please just copy&paste most of oci/layout/oci_transport.go into this file, more or less with s/dir/file/g; s/layout/archive/g instead of bending the semantics like this. Or perhaps create a separate helper subpackage for the shared code, which is explicitly documented to use generic paths without knowing whether they are files or directories.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtrmac okay, would you prefer I copy the code from oci_transport.go or create a separate helper package? For the helper package, would the main changes just be the documentation and variable names?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I’d prefer to just copy that code; that will be sufficient to get this PR merged from my point of view.

We can live with two duplicates of that code for a while I think (@runcom ?); cleaning that up can be a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with @mtrmac to keep duplicated code for now


// tempRef is the path where the directory of the oci image created by oci is stored
// this directory is deleted after a tar file is created from it
tempRef := "/var/tmp/oci-image:" + tag
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing guarantees the "oci-image:"+tag file name is available; use ioutil.TempDir to get a guaranteed-unique-and-available name.

OTOH, ParseReference may be used by callers who will never be calling NewImageSource/NewImageDestination (e.g. when parsing /etc/containers/policy.json), so this method should not be modifying the filesystem at all; so it also can’t create the layout reference. Both will need to happen later in newImageSource/newImageDestination.

Finally, please use the temporaryDirectoryForBigFiles constant the way other sites do. Eventually we’d like to have that directory configurable via types.SystemContext; for now, just using the same constant name makes it easy to find other cases.

// this directory is deleted after a tar file is created from it
tempRef := "/var/tmp/oci-image:" + tag
// ociRef stores the dir, resolvedDir, and tag of the temporary path
ociRef, err := layout.Transport.ParseReference(tempRef)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is layout.NewReference with separate dir and tag arguments; it isn’t necessary to format them into a string only for layout.Transport.ParseReference to split them again.

@umohnani8
Copy link
Member Author

ignore this new push, still working on it.

@umohnani8
Copy link
Member Author

@mtrmac PTAL. I have created a function createOciRef that creates the temp directory and generates the ociReference from the directory's path. The functions that need ociReference will first check to see if it exists, if not createOciRef is called.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick look, didn’t go through line by line.

// newImageDestination returns an ImageDestination for writing to an existing directory.
func newImageDestination(ref ociArchiveReference, ctx *types.SystemContext) (types.ImageDestination, error) {
if ref.ociRef == nil {
if err := createOciRef(&ref); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work with

ref := ParseReference(…)
// possibly repeated:
dest := ref.NewImageDestination()
…
dest.Close() // deletes ref.ociRef.tempDirectory ?!
// possibly repeated:
src := ref.NewImageSource() // tempDirectory unavailablesrc.Close() // deletes ref.ociRef.tempDirectory ?!

the layout.Reference, and tempDirectory, AFAICS needs to be a property of the …Source / …Destination, not of the …Reference, to match the temporary directory lifetimes which terminate at *.Close.


// ValidatePolicyConfigurationScope checks that scope is a valid name for a signature.PolicyTransportScopes keys
func (t ociArchiveTransport) ValidatePolicyConfigurationScope(scope string) error {
return ocilayout.Transport.ValidatePolicyConfigurationScope(scope)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still mixing the semantics… either copy&paste layout.Transport.ValidatePolicyConfigurationScope, or extract into a shared helper which is explicitly agnostic of dir/path (i.e. rename existing layout.Transport.ValidatePolicyConfigurationScope to something like oci.ValidatePathPolicyConfigurationScope, and then call it from both {archive,layout}.Transport.ValidatePOlicyConfigurationScope

logrus.Error("error creating oci reference", err)
}
}
return ref.ociRef.StringWithinTransport()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would return a path to the temporary directory, not to the user-specified archive. And like ParseReference, StringWithinTransport should not be modifying the filesystem.

Again, copy&paste or extract into an independent helper.

logrus.Error("error creating oci reference", err)
}
}
return ref.ociRef.PolicyConfigurationIdentity()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not return a path to the temporary dir — and should not be modifying the FS here.

logrus.Error("error creating oci reference", err)
}
}
return ref.ociRef.PolicyConfigurationNamespaces()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not return a path to the temporary dir — and should not be modifying the FS here.

logrus.Error("error creating oci reference", err)
}
}
return ref.ociRef.DeleteImage(ctx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would delete the just-created temporary directory, not the archive file.

Luckily layout.*.DeleteImage just fails; this should probably as well. Really deleting a single reference means deleting that entry in the index, and perhaps the referenced manifest and blob, it should not delete the whole archive.

@umohnani8
Copy link
Member Author

@mtrmac PTAL


// ociArchiveReference is an ImageReference for OCI Archive paths
type ociArchiveReference struct {
file string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need both? file and resolved file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, ref.file is being used in StringWithinTransport

@umohnani8 umohnani8 force-pushed the oci_archive branch 2 times, most recently from ef4233b to df43627 Compare August 2, 2017 17:59
@umohnani8 umohnani8 changed the title Add oci-archive transport that creates a tar archive of an image [WIP]Add oci-archive transport that creates a tar archive of an image Aug 2, 2017
@umohnani8
Copy link
Member Author

#318 needs to be merged in first

@umohnani8 umohnani8 force-pushed the oci_archive branch 4 times, most recently from fa8e87a to b0fc85b Compare August 7, 2017 14:48
@umohnani8
Copy link
Member Author

@mtrmac PTAL.

@umohnani8 umohnani8 force-pushed the oci_archive branch 2 times, most recently from bab93fc to ca1b94a Compare August 8, 2017 14:42
Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the global variable, only minor or stylistic comments I think.

tempDir string
}

var tempDirRef *tempDirOciRef
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can’t be a global variable; multiple ociArchiveImageSource instances can exist at the same time. The necessary data can probably live inside the ociArchiveImage$something, or maybe just as local variables.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I got rid of the global variable and added this to the ociArchiveReference struct. However, I ran into an issue that I have a temporary fix for.

LoadManifestDescriptor needs to be able to access the image name from the index.json for the kpod load command. Since LoadManifestDescriptor is called before newImageSource for kpod load, I tried to check if tempDirRef has a value in the instance of ociArchiveReference and setting it in one of the functions. However, two instances were being created and ociArchiveReference wouldn't retain its value.

I can't seem to figure out why the instance of ociArchiveReference is not the same - could be that LoadManifestDescriptor is being called from libkpod/copy.go. I have a temporary fix in place that just creates a temp directory and deletes it in LoadManifestDescriptor.

@mtrmac I know this might not be the ideal way, but just wanted your feedback and advice on this issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, two instances were being created and ociArchiveReference wouldn't retain its value.

Yeah,ociArchiveReference is defined to be a value of that struct type, i.e. every value of type ociArchiveReference is an independent copy of the value, not a reference to a shared underlying object. I.e. every method with a receiver of (ref ociArchiveReference) gets a fresh copy, and every parameter with type ociArchiveReference (notably the ref parameters to newImage{Source,Destination}) gets a copy. You are probably seeing the copies made during the newImage{Source,Destination} parameter passing.

In general, ImageReference is mostly being treated as an immutable-after-construction object, so storing state in there is a bit tricky; though at least caching information in it for speed up (without changing semantics) should be fine. The storage backend uses references that way.

OTOH attaching the lifetime of the temporary directories to ImageReference is, with the current structure, problematic, because multiple ImageSource/ImageDestination objects could be created from the same ImageReference over time (or even concurrently!), and there is no good place to delete the temporary directory: Image{Source,Destination}.Close would delete it after the first user is done, possibly breaking all other users, and there is no ImageReference.Close.

So, yeah, extracting the archive in LoadManifestDescriptor, and tearing down the extracted archive immediately in that function, and then extracting it again in NewImageSource, is the best fit with the interface structure. Arguably that’s just evidence of a badly designed interface which can’t deal with this reasonable use case… but the cost of making the LoadManifestDescriptor+NewImageSource sequence performant (forcing every user of ImageReference to treat them as stateful objects and to call ref.Close eventually, and dealing with multiple users and locking and reference counts in the ImageReference implementations) is fairly high.

Overall having LoadManifest extract the archive one more time seems like a reasonable compromise. Or, perhaps better, it could just seek inside the archive to get the manifest file, like tarfile.Source.readTarComponent does, without extracting gigabytes of data onto the disk; with an uncompressed archive this would be quite cheap. With a compressed archive we might still have to decompress gigabytes of data, but at least we wouldn’t have to write it to disk.

Perhaps we could also make a special-case exception, by documenting that a reference passed to LoadManifestDescriptor must be always, and exactly once, passed to ref.NewImageSource(), and implementing the pair of these calls as a special case, without dealing with the general case of multiple, possibly concurrent, callers to ref.NewImage{Source,Destination}. That would be quite ugly, but ugly only locally, without impacting the rest of the code, and high-performing. I guess I’d start with the extracting-twice solution (or streaming-and-extracting solution) first, and defer implementing this special case only if the performance improvement turned out to be necessary.


// newImageDestination returns an ImageDestination for writing to an existing directory.
func newImageDestination(ref ociArchiveReference, ctx *types.SystemContext) (types.ImageDestination, error) {
tempDirOciRef, err := createOciRef(ref.image)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This creates an one-time named structure, unpacks it in the same function, and then throws it away. Perhaps createOciRef could just return a tuple of three values (reference, path, error) instead of bothering to define and return a structure.

OTOH structures with named members are more explicit than tuples with positional results, perhaps it is worth it. Or if deleteTempDir became a method on the structure.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't happen anymore. Fixed

func (d *ociArchiveImageDestination) SupportsSignatures() error {
err := d.unpackedDest.SupportsSignatures()
if err != nil {
if err := deleteTempDir(); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SupportsSignatures returning an error is not a critical failure, no need to clean stuff up just because SupportsSignatures returns an error. Besides, Close will do the cleanup anyway (the caller is expected to ensure that Close is eventually called).

func (d *ociArchiveImageDestination) PutBlob(stream io.Reader, inputInfo types.BlobInfo) (types.BlobInfo, error) {
valBlobInfo, err := d.unpackedDest.PutBlob(stream, inputInfo)
if err != nil {
if err := deleteTempDir(); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, the cleanup shouldn’t be necessary. And in other places in this file.

src := d.tempDir
// path to save tarred up file
dst := d.ref.resolvedFile
if err := tar(src, dst); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This works just fine; I can’t shake a feeling that maybe tar could be renamed to something that makes the semantics of its arguments explicit just by reading the call, and then the src/dst assignments and the comments could be unnecessary. But I can’t think of a good alternative name for tar; tarDirectoryTo feels awkward and insufficient. Perhaps someone smarter than me can find a better name; if not, this is quite readable as is.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to tarDirectory - makes more sense I guess.

ociRef types.ImageReference
}

// createOciRef creates the oci reference of the image
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should say layout / extracted or something like that; ociArchiveReference already is an “OCI” reference, that’s not quite the point.

// dir is a temporary directory used to generate the OCI image
dir, err := ioutil.TempDir("/var/tmp", "oci")
if err != nil {
return nil, errors.Errorf("error creating file %v", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/file/directory/, and either there should be a colon before %v, or use errors.Wrapf


// createOciRef creates the oci reference of the image
func createOciRef(image string) (*tempDirOciRef, error) {
// dir is a temporary directory used to generate the OCI image
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generate… or untar and read. Perhaps it’s simpler to delete the comment than to make it exactly correct :)

@@ -10,6 +10,7 @@ import (
_ "github.com/containers/image/docker"
_ "github.com/containers/image/docker/archive"
_ "github.com/containers/image/docker/daemon"
_ "github.com/containers/image/oci/archive"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this have a smoke test with the others?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

working on the tests

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… this is also outstanding, not blocking I guess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added tests for oci_transport, are there any other tests that I am missing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transports/alltransports/alltransports_test.go:TestImageNameHandling

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the test. @mtrmac PTAL


// newImageDestination returns an ImageDestination for writing to an existing directory.
func newImageDestination(ref ociArchiveReference, ctx *types.SystemContext) (types.ImageDestination, error) {
tempDirOciRef, err := createOciRef(ref.image)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throughout: For no really good reason, it seems that the Go convention is to use all lowercase for an acronym at the start of an identifier, but all uppercase elsewhere. So ociArchiveReference, but createOCIRef.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@umohnani8 umohnani8 force-pushed the oci_archive branch 2 times, most recently from d13e472 to a464523 Compare August 14, 2017 16:42

// deletes the temporary directory created
func (t *tempDirOciRef) deleteTempDir() error {
if t != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can t ever be nil here? The check doesn’t really hurt of course, but we don’t have every single method of every single type protecting against the receiver being nil.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it can't really ever be nil when that function is called. Took the check out - fixed.


// SupportsSignatures returns an error (to be displayed to the user) if the destination certainly can't store signatures
func (d *ociArchiveImageDestination) SupportsSignatures() error {
return nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also forward to d.unpackedDest (which does not return nil!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtrmac are you saying this should return d.unpackedDest also?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return d.unpackedDest.SupportsSignatures()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, I think I got what you meant. Fixed.


descriptor, err := ocilayout.LoadManifestDescriptor(ociArchRef.tempDirRef.ociRefExtracted)
if err != nil {
if err := ociArchRef.tempDirRef.deleteTempDir(); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There already is a defer ….deleteTempDir() above; and deleting the same name from a shared */tmp directory twice is not, strictly speaking, safe (the name could have been reused by an unrelated process in the meantime).

If the point is to not ignore the error, that can be done in the defer block, e.g. see how copy.Image does that—but only if it is really important to preserve the cleanup error.

In this particular case it seems that the error loading index error is more important than the error deleting temporary directory, at least the error loading index one shouldn't be thrown away; though notifying the user that the cleanup failed might also be useful, it is less important. (See also the discussion around #236 (comment) if you care.)

Copy link
Member Author

@umohnani8 umohnani8 Aug 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the error loading index error is more important, I didn't do what copy.Image does. Just took out the second deleteTempDir() and left it as a defer. Fixed.

@@ -177,8 +177,12 @@ func (d *ociImageDestination) PutManifest(m []byte) error {
return err
}

if d.ref.image == "" {
return errors.Errorf("cannot save image with empyt image.ref.name")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(ignoring the oci/layout parts, to be reviewed in #318 ).

tempDir string
}

var tempDirRef *tempDirOciRef
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, two instances were being created and ociArchiveReference wouldn't retain its value.

Yeah,ociArchiveReference is defined to be a value of that struct type, i.e. every value of type ociArchiveReference is an independent copy of the value, not a reference to a shared underlying object. I.e. every method with a receiver of (ref ociArchiveReference) gets a fresh copy, and every parameter with type ociArchiveReference (notably the ref parameters to newImage{Source,Destination}) gets a copy. You are probably seeing the copies made during the newImage{Source,Destination} parameter passing.

In general, ImageReference is mostly being treated as an immutable-after-construction object, so storing state in there is a bit tricky; though at least caching information in it for speed up (without changing semantics) should be fine. The storage backend uses references that way.

OTOH attaching the lifetime of the temporary directories to ImageReference is, with the current structure, problematic, because multiple ImageSource/ImageDestination objects could be created from the same ImageReference over time (or even concurrently!), and there is no good place to delete the temporary directory: Image{Source,Destination}.Close would delete it after the first user is done, possibly breaking all other users, and there is no ImageReference.Close.

So, yeah, extracting the archive in LoadManifestDescriptor, and tearing down the extracted archive immediately in that function, and then extracting it again in NewImageSource, is the best fit with the interface structure. Arguably that’s just evidence of a badly designed interface which can’t deal with this reasonable use case… but the cost of making the LoadManifestDescriptor+NewImageSource sequence performant (forcing every user of ImageReference to treat them as stateful objects and to call ref.Close eventually, and dealing with multiple users and locking and reference counts in the ImageReference implementations) is fairly high.

Overall having LoadManifest extract the archive one more time seems like a reasonable compromise. Or, perhaps better, it could just seek inside the archive to get the manifest file, like tarfile.Source.readTarComponent does, without extracting gigabytes of data onto the disk; with an uncompressed archive this would be quite cheap. With a compressed archive we might still have to decompress gigabytes of data, but at least we wouldn’t have to write it to disk.

Perhaps we could also make a special-case exception, by documenting that a reference passed to LoadManifestDescriptor must be always, and exactly once, passed to ref.NewImageSource(), and implementing the pair of these calls as a special case, without dealing with the general case of multiple, possibly concurrent, callers to ref.NewImage{Source,Destination}. That would be quite ugly, but ugly only locally, without impacting the rest of the code, and high-performing. I guess I’d start with the extracting-twice solution (or streaming-and-extracting solution) first, and defer implementing this special case only if the performance improvement turned out to be necessary.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good. Outstanding:

  • The reference naming etc. + tests, need to be reviewed in Fix oci to save the full image name in index.json #318 first.
  • Lifetime of the temporary directories. I agree with the simple solution of having LoadManifestDescriptor extract the archive again; in that case, the temporary directories should probably be attached to the ociArchive{Source,Destination} objects, not ociArchiveReference EDIT, which should be more or less immutable.

@umohnani8
Copy link
Member Author

@mtrmac are you saying that the tempDirOciRef struct should be attached to ociArchiveSource and ociArchiveDestination instead of ociArchiveReference?
If that is the case LoadManifestDescriptor takes in a types.ImageReference, which can be typecasted to ociArchivereference, so it made sense to me to attach it to ociArchiveReference.

@mtrmac
Copy link
Collaborator

mtrmac commented Aug 30, 2017

@mtrmac are you saying that the tempDirOciRef struct should be attached to ociArchiveSource and ociArchiveDestination instead of ociArchiveReference?

Yes, probably—if we can live with the performance impact.

If that is the case LoadManifestDescriptor takes in a types.ImageReference, which can be typecasted to ociArchivereference, so it made sense to me to attach it to ociArchiveReference.

Yes, and extracting the archive only once would be nice; but types.ImageReference has no lifetime or sharing semantics, and the deletion of directories happens in Image{Source,Destination}.Close, so the creation should happen in the same object or something with a closely related lifetime.

Consider

ref := archive.NewReference(…)
src1 := ref.NewImageSource()  // Extracts temporary directory, sets ref.tempDirRef
src2 := ref.NewImageSource() // Does this reuse the directory from src1, or create a new one?
// Either way, src1 and src2 now both use ref.tempDirRef// use src1, src2
src1.Close() // This deletes ref.tempDirRef; does that break src2’s use of it?
LoadManifestDescriptor(ref) // Similar concerns as newImageSource; does the cleanup break src2 again?
src2.Close()

Supporting something like this, while sharing the extracted directory, would require ref to keep a (thread-safe?) reference count, and probably a ImageReference.Close to clean up, and be somewhat resilient against concurrent uses of ImageSource/ImageDestination on the same ref. It’s not impossible but it’s a fair amount of work.

Also, if you look at what the current code actually does, createOCIRef creates a new temporary directory on each call, and createUntarTempDir extracts the archive on each call; neither is reused across uses of ref. So, the code is already effectively giving a separate temporary directory to each Image{Source,Destination} instance; storing it in those instances directly instead of the shared ImageReference object would be both simpler and eliminate the problems with .Close() deleting a directory used by another user of the reference.

@rhatdan
Copy link
Member

rhatdan commented Aug 30, 2017

Lets go with the slow mode for now, and then if we find it to be a performace issue, we can revisit this in the future.

Fix oci to save the full image name "image:tag" instead of just the tag in index.json.
Add function to retrieve the image name from index.json when loading or pulling image from oci directory.

Signed-off-by: umohnani8 <umohnani@redhat.com>
@umohnani8 umohnani8 force-pushed the oci_archive branch 2 times, most recently from 215c7d0 to 02be3b2 Compare August 31, 2017 15:23
@umohnani8 umohnani8 changed the title [WIP]Add oci-archive transport that creates a tar archive of an image Add oci-archive transport that creates a tar archive of an image Aug 31, 2017
@umohnani8
Copy link
Member Author

@mtrmac attached the tempDirOCIRef struct to ociArchiveImageSource and ociArchiveImageDestination. PTAL

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one very last nit.

(Thanks for keeping this up-to-date with #318!)

// NewImageDestination returns a types.ImageDestination for this reference.
// The caller must call .Close() on the returned ImageDestination.
func (ref ociArchiveReference) NewImageDestination(ctx *types.SystemContext) (types.ImageDestination, error) {
return newImageDestination(ref, ctx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep the ctx parameter in the first position in newImage{Source,Destination}, to be consistent with other users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@runcom
Copy link
Member

runcom commented Aug 31, 2017

LGTM

Approved with PullApprove

Signed-off-by: umohnani8 <umohnani@redhat.com>
@mtrmac
Copy link
Collaborator

mtrmac commented Aug 31, 2017

👍

Thanks!

Approved with PullApprove

@mtrmac mtrmac merged commit 960e725 into containers:master Aug 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants