Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugins/bundle: Support for saving and reading bundles from disk #2702

Merged

Conversation

ashutosh-narkar
Copy link
Member

This commit adds support to persist and load bundles from disk.
A new field is introduced in OPA's bundle configuration that can
be optionally set to specify the location on disk to write and read
bundles. This feature will allow OPA to serve policy decisions in scenarios
such as OPA being unable to communicate with the bundle server.

Fixes #2097

Signed-off-by: Ashutosh Narkar anarkar4387@gmail.com

@@ -97,6 +97,7 @@ bundles:
authz:
service: acmecorp
resource: somedir/bundle.tar.gz
persistence_file_location: /tmp/example
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just kind of thinking out-loud here but would it make sense to have a default cache path and then more like a boolean opt-in for keeping local copies for each bundle?

That way we can default to some safe-ish internal thing like /var/lib/opa/ or ~/.opa/ or whatever and someone can turn it on or off by configuring a bundle like:

authz:
  service: acmecorp
  resource: somedir/bundle.tar.gz
  cache:
    persist: true

(or whatever we want to call it)

Without having to worry about where it goes, more than likely they don't care. We can still allow for customizing it with like:

authz:
  service: acmecorp
  resource: somedir/bundle.tar.gz
  cache:
    persist: true
    persistence_file_location: /tmp/example

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of adding a default persist location. Regarding the persist field, if a user sets persist: false or does not specify it but sets persistence_file_location that would be an error or do we imply persist: true ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say yes, but to start I don't think we'll need the persistence_file_location.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed persistence_file_location and added persist field.

bundleDir := filepath.Join(path, name)
var backupBundleDir string

// if a bundle already exists, create a backup and then write the new bundle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to consider adjusting the order for these. Making the backup and restoring on error is good, but if OPA was stopped/crashed/etc mid-way through writing the new one I think we'd end up in trouble (the deferred calls to restore the backup would never get called)

A common strategy for this kind of thing is to write into a temporary file, and only after a successful write with the whole thing we swap it in with a like mv sort of operation on the filesystem so it is atomic. The goal being that you don't allow for a period of time where the file we've persisted is in a partially written state. From a high level I'd imagine the steps to be more like:

  • Write to filepath.Join(bundleDir, ".bundle.tar.gz.tmp"). or whatever we want to call it, the gist being that its in the same directory and similar name but is a hidden file and somewhat obvious that its a partial or temp file.
  • If it fails, delete that new/temporary file. No restore needed, the original is still there.
  • If it succeeds, use os.Rename() to change the tmp file to the "real" path. This will overwrite the old version (same as like mv -f)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the backup/restore logic.

Comment on lines 487 to 518
// if a bundle already exists, write the new bundle to a temporary file and if successful, replace
// the old bundle with the new one
if _, err := os.Stat(filepath.Join(bundleDir, "bundle.tar.gz")); err == nil {
err := saveCurrentBundleToDisk(bundleDir, ".bundle.tar.gz.tmp", b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should always write to the temp file and do the rename to avoid a case where OPA has written part of it, stopped, restarted, and tries to read the partial file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@patrick-east patrick-east left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, just a few minor comments 👍

One thing that occurs to me is that the plugin status looks like it would only be set to "ready" after it downloads the bundle versus loading the one from disk. Was that something that was discussed previously?

Comment on lines 277 to 279
if p.status[name].Metrics == nil {
p.status[name].Metrics = metrics.New()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would want to overwrite the metrics here with a new one, even if (for whatever reason) its non-nil. I think the steps in process() try and merge it with the stuff from the download update, but it is still sort of 1:1 for a metric object and an attempt to load a bundle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 359 to 361
p.logDebug(name, "Persisting bundle to disk in progress.")

path, err := getDefaultBundlePersistPath()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I might have missed it below, but we might want to include the path in at least one of the debug logs for saving the bundle to help with troubleshooting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated a debug log to include bundle persistence path on disk.

Comment on lines 512 to 532
if _, err := os.Stat(filepath.Join(bundleDir, "bundle.tar.gz")); err == nil {
err := saveCurrentBundleToDisk(bundleDir, ".bundle.tar.gz.tmp", b)
if err != nil {
return os.Remove(filepath.Join(bundleDir, ".bundle.tar.gz.tmp"))
}
} else {
err := saveCurrentBundleToDisk(bundleDir, ".bundle.tar.gz.tmp", b)
if err != nil {
os.Remove(filepath.Join(bundleDir, ".bundle.tar.gz.tmp"))
return err
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the if/else here, both do the same thing, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one difference: If you try to save a bundle to disk when there isn't one already, we return the error that occurred while saving it. OTOH, if there is an exiting bundle and the new one could not be saved, we return the error from removing the temp file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh ok, I see it now. Looking at where the error is used, would we want the bundle status to go into an error state if the temp file cleanup failed? Seems like maybe a log message or something to warn about it, but otherwise the bundle that was activated is still ok, right?

I guess thinking about it more we should be careful to at least log any of the errors (from save, cleanup, etc), especially if they don't affect the bundle status. Future us reading some bug report with a log will probably be thankful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a warning log to capture errors that don't affect bundle status.

Comment on lines 1575 to 1608
b := getTestBundle(t)

srcDir, err := ioutil.TempDir("", "")
if err != nil {
t.Fatalf("unexpected error %v", err)
}

defer os.RemoveAll(srcDir)

err = saveBundleToDisk(srcDir, "foo", &b)
if err != nil {
t.Fatalf("unexpected error %v", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but what is this first bundle used for in the test? Would it make sense to split this up into separate test cases? (seems like there is maybe more than one thing being tested)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split the test into 2 cases.

ctx := context.Background()
manager := getTestManager()

dir, err := getDefaultBundlePersistPath()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to provide some mechanism to override the default path, even if only for tests, to avoid having the unit tests clutter the working dir (the deferred cleanup only works if the function finishes, crashing in the middle or getting killed by the user can leave stuff behind). Would be ideal if the unit tests were always using temp directories.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a mechanism for tests to override the default bundle persistence path.

@ashutosh-narkar ashutosh-narkar force-pushed the bundle-disk-support branch 3 times, most recently from 7f8e246 to 1b87df3 Compare September 30, 2020 18:12
Comment on lines 517 to 532
if _, err := os.Stat(filepath.Join(bundleDir, "bundle.tar.gz")); err == nil {
err := saveCurrentBundleToDisk(bundleDir, ".bundle.tar.gz.tmp", b)
if err != nil {
p.logWarn(name, "Failed to save new bundle to disk: %v", err)
err = os.Remove(filepath.Join(bundleDir, ".bundle.tar.gz.tmp"))
if err != nil {
p.logWarn(name, "Failed to remove temp file ('.bundle.tar.gz.tmp'): %v", err)
}
return nil
}
} else {
err := saveCurrentBundleToDisk(bundleDir, ".bundle.tar.gz.tmp", b)
if err != nil {
rErr := os.Remove(filepath.Join(bundleDir, ".bundle.tar.gz.tmp"))
if rErr != nil {
p.logWarn(name, "Failed to remove temp file ('.bundle.tar.gz.tmp'): %v", rErr)
}
return err
}
}
Copy link
Contributor

@patrick-east patrick-east Sep 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can refactor this to reduce duplicate code like:

	tmpFile := filepath.Join(bundleDir, ".bundle.tar.gz.tmp")
	bundleFile := filepath.Join(bundleDir, "bundle.tar.gz")

	saveErr := saveCurrentBundleToDisk(bundleDir, ".bundle.tar.gz.tmp", b)
	if saveErr != nil {
		p.logWarn(name, "Failed to save new bundle to disk: %v", saveErr)

		if err := os.Remove(tmpFile); err != nil {
			p.logWarn(name, "Failed to remove temp file ('%s'): %v", tmpFile, err)
		}

		if _, err := os.Stat(bundleFile); err == nil {
		    p.logWarn(name, "Older version of activated bundle persisted, ignoring error")
			return nil
		}
		return saveErr
	}
	
	return os.Rename(tmpFile, bundleFile)

(note there is a log message when we ignore the error too)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored the code as per your suggestion.

patrick-east
patrick-east previously approved these changes Sep 30, 2020
Copy link
Contributor

@patrick-east patrick-east left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM. Theres one refactor suggestion but nothing blocking, feel free to merge when ready.

patrick-east
patrick-east previously approved these changes Sep 30, 2020
This commit adds support to persist and load bundles from disk.
A new field is introduced in OPA's bundle configuration that can
be optionally set to enable OPA to write and read bundles from disk.
This feature will allow OPA to serve policy decisions in scenarios
such as OPA being unable to communicate with the bundle server.

Fixes open-policy-agent#2097

Signed-off-by: Ashutosh Narkar <anarkar4387@gmail.com>
@ashutosh-narkar ashutosh-narkar merged commit 5a79a45 into open-policy-agent:master Sep 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for offline recovery by saving and reading bundles from disk
3 participants