Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delta Bundles #1055

Closed
tsandall opened this issue Nov 6, 2018 · 16 comments
Closed

Delta Bundles #1055

tsandall opened this issue Nov 6, 2018 · 16 comments
Assignees
Labels
design distribution Issues related to the bundle plugin

Comments

@tsandall
Copy link
Member

tsandall commented Nov 6, 2018

The bundle protocol is designed around polling and snapshots. As a result, bundles are not well-suited for cases that require fast propagation of policy and data changes and/or small changes to large snapshots.

To address these issues we could extend OPA and the bundle protocol to support:

  1. Long-lived HTTP connections that serve bundles and bundle deltas.
  2. Delta-based bundles that contain patches to policy and data instead of snapshots.

By combining long-lived HTTP connections that stream bundle deltas to clients, bundle services can propagate changes to OPAs quickly and efficiently.

Example HTTP interaction

GET /bundles/examplia/gratia HTTP/1.1
If-None-Match: "etag-or-revision-of-successful-activation"
Prefer: stream
HTTP/1.1 200 OK
Content-Type: application/vnd.openpolicyagent.bundles
Transfer-Encoding: chunked
ETag: "etag-of-new-bundle"
Cache-Control: max-age=300

Details:

  • The message body contains stream of snapshot and delta bundles.
  • The chunk size specifies the size of the bundle.
  • If the connection between the client and server is closed for any reason, the client will reconnect using the etag/revision of the last successful activation (this is the current behaviour.)

Differences from existing protocol:

  • Client includesPrefer: stream HTTP header
  • Server replies with application/vnd.openpolicyagent.bundles if streaming is supported.

If the client specifies Prefer: stream and the server does not support it, the client will fallback to polling.

Delta bundles

We'll introduce a new kind of bundle that encodes a series of data patch and policy update operations. Delta bundles have the same structure and layout semantics as snapshot bundles. Differences between snapshot and delta bundles:

  • Clients will read patch.json files out of delta bundles. patch.json files will contain a JSON Patch (i.e., an array of JSON objects.) The operations in the JSON Patch will be applied to OPA's in-memory store in order. The JSON Patch sets will be applied in alphabetical order (e.g., a/b/c/patch.json will be applied before a/b/c/d/patch.json.)
  • Clients will insert or update policies inside the delta bundle. Policies that are not listed in the delta bundle will be carried over from the previous bundle.
  • Clients will remove policies listed in the manifest (e.g., {"removed_policies": ["path/to/policy.rego"], ...})
  • Clients will use the "revision" value from the manifest for the If-None-Match header when executing subsequent bundle GET requests.

Example Delta Bundle

patch.json
.manifest

patch.json:

[
  {"op": "add", "path": "/roles/foo", "value": {"method": "get", "path": "widgets"}}
]

Open Questions

  • It's not immediately clear how this feature would work with bundle signatures (Signature support in OPA bundles #1757). The initial implementation does not have to address this but the design should account for it.
@tsandall tsandall self-assigned this Nov 6, 2018
@tsandall
Copy link
Member Author

tsandall commented Nov 8, 2018

Another benefit to the patch approach is that it would allow us to have data loaded into OPA from different sources (e.g., kube-mgmt). It would take a bit more work to allow patching of policies but that's also possible.

@koponkin
Copy link

Hi @tsandall! Is this issue in progress? Is there a release date for it?

@tsandall
Copy link
Member Author

hi @koponkin, we're not working on this currently and we have a full plate of features between now and November. If you or someone else would like to work on this, I could provide pointers. If not, we can certainly revisit this around November.

@koponkin
Copy link

koponkin commented Sep 23, 2019

Thanks for the update!

we can certainly revisit this around November.

Will wait for this awesome feature.

@yyunikov
Copy link

yyunikov commented Oct 9, 2019

@tsandall Would be great for us to have this one.

Just to add some context we have a number of services where each of them has instance of OPA. There is one service which provides bundles for all OPA's and taking into account number of instances and large amounts of data in the bundle it's very hard to scale it. We can only scale it vertically by adding more memory to fix an issue with lots of data.

Delta bundles would reduce this issue a lot, so we're waiting for this feature.

@zzFluke
Copy link

zzFluke commented Feb 13, 2020

Hi @tsandall, I wonder is this under development or planned for some near time? If not, I would consider contribution to this, so would you be so kind to share "pointers" you've mentioned above.

Thanks in advance!

@ShevtsovBohdan
Copy link

ShevtsovBohdan commented Feb 13, 2020

@tsandall Wery useful feature. It was discussed in this stream on the slack channel

To add more context:

Our system stores all the data in the DB. So every time some small piece of data is updated in the DB we will have to grab all the data and add it to the bundle. I expect that it will add unnecessary latency, so partial update of the data would be a great feature.

Some implementation thoughts:
OPA already supports manifest files inside bundles that contains "roots" field. So, from my perspective, we can erase only those data files that are stored under specified "roots".

So, for example:
With first OPA pull request, server responds with bundle that contains folder structure

    .manifest
    <tenantFolder1> 
          <tenantDataFolder>
               data.json

.manifest file contains:

  {
   "revision":  "1" 
  } 

Since the .manifest file does not contain the "roots" field, OPA will erase all the existing data and load files from the bundle.

With second OPA pull request, server responds with the bundle:

    .manifest  
    <tenantFolder2> 
          <tenantDataFolder>
               data.json

.manifest file contains:

  {
   "revision":  "2", 
   "roots": ["tenantFolder2"]
  }

Since .manifest file contains "roots" field, data will be erased only under specified "roots".

What do you think about such decision?

Thanks.

P.S.: I'm ready to try to work on this, so you could provide pointers.

@tsandall
Copy link
Member Author

@ShevtsovBohdan sorry for the delayed response, not sure how I missed your update. A few thoughts.

Since the .manifest file does not contain the "roots" field, OPA will erase all the existing data and load files from the bundle.

When roots is missing/undefined, it defaults to ["/"], i.e., specifying roots: ["/"] is the same as omitting the roots key. This essentially tells OPA that the bundle is responsible for ALL of the policies and data.

Since .manifest file contains "roots" field, data will be erased only under specified "roots".

The problem is that bundles are currently intended to be snapshots of state for OPA to load, i.e., they contain ALL of the policy and data that OPA should load. With multi-bundle feature you can load multiple bundles into OPA but they have to be disjoint (they can still refer to each other but they can't overlap in the data namespace.)

Given that bundles are snapshots describe the full state of OPA, when a new bundle is received, OPA must erase the state from the previous bundle. It can't leave portions of the old bundle around.

In the issue description, the proposal is to extend bundles to communicate more than just snapshots, i.e., they can communicate deltas. There are probably other equally viable ways of representing deltas but the design sketched out in the issue seems reasonable.

One thing I'm unsure about is performance. Currently each time a bundle is downloaded and activated, OPA has to re-parse and compile the policies. This is somewhat expensive. If delta bundles are being downloaded, I would expect the simplest implementation would still re-parse and compile everything on each delta. For simplicity, perhaps we should only support deltas to JSON (not allow insert/update/remove of .rego files in delta bundles.) Question for you @ShevtsovBohdan: Do you need the ability to insert/update/remove specific .rego files in your delta bundes or would it be enough to support deltas on data.json files?

If you'd like to work on this issue that would be great. @patrick-east refactored a bunch of the bundle activation code so he should probably weigh in here but I think that something like this would make sense (in order):

  1. Extend the opa/bundle package to include support for deltas. This probably requires (i) updates to the manifest to indicate the bundle is a delta bundle (this is just a single bit, e.g., type: delta) (ii) updates to the manifest to support removal of .rego files (outlined above, maybe not required if we don't support deltas on policies yet) (iii) updates to the bundle structure to include "patches" and the reader to load "patch.json" files. For this last point we could simplify the reader implementation initially and only permit a single "patch.json" at the root of the bundle. The bundle validation code will also have have to get updated to verify that bundles are not trying to remove policies outside of the bundle's roots or patch data outside of the bundle's roots.

  2. The bundle activation implementation in (opa/bundle#Activate) can be updated to support the new delta bundles. I'm a bit hazy on the details here, @patrick-east can give better pointers.

  3. The opa/downloader package can be updated to support long lived connections w/ long polling as described in the issue description.

From an implementation/review perspective, it would be good to break these changes up into separate PRs so we can review them without getting overwhelmed. Obviously all three of the stages require test coverage (we already have a bunch of integration tests around this stuff so it should be mostly additive) and documentation.

Before we move forward we should make sure the performance is going to be acceptable or we need to decide to only support patch.json files to start.

@ShevtsovBohdan
Copy link

@tsandall Thanks for the detail explanation.

Now we are using one .rego file, so basically we can update it using the API. So, the answer to the question would be No, so far we don't need the ability to insert/update/remove specific .rego files in our delta bundles.

Some clarification: am I understand correctly that using the approach described in the issue, we can re-use some logic from the PUT endpoint?

@tsandall
Copy link
Member Author

Some clarification: am I understand correctly that using the approach described in the issue, we can re-use some logic from the PUT endpoint?

Yes, the storage layer supports JSON patch-style operations on the data document (which is what the server's /v1/data PUT endpoint uses.)

One area that hasn't been fleshed out in the design is how this feature would interact with bundle signatures (#1757).

@amar1728
Copy link

Hi @tsandall Is this issue under development or in planned state? Any estimate on when this feature could land?

@tsandall
Copy link
Member Author

@amar1728 it's not under development yet. We don't have an estimate on when it will be delivered yet.

@tsandall tsandall changed the title Improve support for propagating policy and data updates with bundles Delta Bundles Dec 8, 2021
@srenatus
Copy link
Contributor

srenatus commented Mar 1, 2022

💭 If it's done, we can close it, can't we?

@ashutosh-narkar
Copy link
Member

Just to clarify in the current version of delta bundles, only changes to data are supported and not policy. We will work on delta support for policy in the future.

@anderseknert
Copy link
Member

I'd suggest we close this one as done and create a new ticket for policy deltas if/when we get to that.

@anderseknert
Copy link
Member

Closing issue as delta bundles are implemented in OPA. If there's demand to have this extended to policy later, I'd suggest we create a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design distribution Issues related to the bundle plugin
Projects
Status: Done
Development

No branches or pull requests

9 participants