Authorization gateway - basic features #138

tomeresk · 2020-06-14T14:43:57Z

Implemented:

Continuous in-memory caching of the compiled policy WASM from the resource repository
Policy directive that can run opa policy types, with support for args with param injection

Not yet implemented (Planned for future PRs, not this one):

Unit tests for opa.ts and policy-executor.ts in the policy directive folder
JWT info injection into Rego code
Policy directive parameter that allows choosing whether any or all of the required policies should pass (basically And vs Or). Currently it always requires all mentioned policies to pass (And).
Queries evaluation (Graphql and Policy types)
Memoization and performance optimizations

…or all resource repositories

Yshayy

Is it possible to implement the policy resource abstraction in a layer above the ResourceRepository? I think it adds complexity to something that should be "stupid" storage.

AleF83 · 2020-06-15T04:30:54Z

services/src/modules/directives/policy/types.ts

+    queries?: QueriesResults;
+};
+
+export type QueriesResults = {


Maybe QueryDefinition instead of QueryResults

This contains the results of all the queries associated with a specific directive execution, not the definition (which is why the schema here is not strict, since we don't know how the results look like).

The query definition with the strict schema is defined here

AleF83 · 2020-06-15T05:12:09Z

services/src/modules/resource-repository/s3.ts

+            const params: any = {
+                Bucket: this.config.bucketName,
+                MaxKeys: 1000,
+                Prefix: this.config.policyAttachmentsKeyPrefix,
+            };
+            if (continuationToken) params['ContinuationToken'] = continuationToken;


Suggested change

const params: any = {

Bucket: this.config.bucketName,

MaxKeys: 1000,

Prefix: this.config.policyAttachmentsKeyPrefix,

};

if (continuationToken) params['ContinuationToken'] = continuationToken;

const params: AWS.S3.Types.ListObjectsV2Request = {

Bucket: this.config.bucketName,

MaxKeys: 1000,

Prefix: this.config.policyAttachmentsKeyPrefix,

};

if (continuationToken) params.ContinuationToken = continuationToken;

AleF83 · 2020-06-15T05:14:19Z

services/src/modules/resource-repository/s3.ts

+        this.policyAttachmentsRefreshedAt = newRefreshedAt;
+    }
+
+    private shouldRefreshPolicyAttachment({filename, updatedAt}: {filename: string; updatedAt: Date}) {


Export {filename: string; updatedAt: Date} as type

AleF83 · 2020-06-15T06:25:43Z

services/src/modules/directives/policy/policy.ts

+        const policies = this.args.policies;
+
+        field.resolve = async (parent: any, args: any, context: RequestContext, info: GraphQLResolveInfo) => {
+            const executor = new PolicyExecutor(policies, parent, args, context, info);


I think that PolicyExecutor can be stateless class with static methods

In the current situation I would agree, but it's a WIP that should possibly later contain shared execution context for the current request, for optimizations and memoization of query results and other things. I eventually removed the optimizations from it for now, and delayed their implementation to be done separately as a different task.

If we end up implementing the optimizations in another way that does not use this class for it, we can convert it later

I agree, but actually it can be just function in the current state. I think if we'll need to make it more complex we'll do it then.

Looking at the code now, if we change it to a stateless class we would have to pass around a lot of arguments between functions. It will probably be annoying enough to do, that some code that could be split to functions would remain as bigger functions that are harder to work with and test, in order to avoid having to pass all that context around

It's 4-5 parameters at most. On other side it encourages you to write pure functional code where it can be done.

AleF83 · 2020-06-15T06:32:30Z

services/src/modules/directives/policy/policy-executor.ts

+            policyArgs[policyArgName] = policyArgValue;
+            return policyArgs;


Suggested change

policyArgs[policyArgName] = policyArgValue;

return policyArgs;

return { ... policyArgs, [policyArgName]: policyArgValue };

Since this code will run on every field with a policy directive, it can potentially run very often.

Changing to your suggestion means that each iteration of the reduce loop would create a new "result"/accumulator object, and the old one would have to be garbage collected.
If I keep the code as is and re-use the same object for the entire loop, that object would be garbage collected only once when the loop ends instead of for every iteration.
It's a very small optimization, but it builds up when there are a lot of them in hot code paths

tomeresk · 2020-06-15T07:06:23Z

Is it possible to implement the policy resource abstraction in a layer above the ResourceRepository? I think it adds complexity to something that should be "stupid" storage.

I generally agree, it was that way initially, but after discussing it with Aviv we decided to make the change for practical reasons.
There were some complications around the saving of the in-memory copy and different underlying repository APIs, and we saw that Aviv had the same issue while implementing the resource group itself, and decided to put it inside the resource repository to avoid those issues. We decided to take the same approach here.

For example, S3 supports listing files along with the details for each file (notably LastModified), while FS only supports listing files, and then the extra details have to be requested for each file individually.
Since the two repositories implement the ResourceRepository interface, from the external abstraction later we would need to work with both using the same functions (that are in the interface). The problem is that each of these underlying repositories has a different way of implementing this logic with optimized performance.

It makes sense that the repository itself would be aware of the implementation with the best performance rather than the abstraction layer intimately knowing the internals of each of the repositories.
I did attempt the approach of the abstraction layer knowing the internals and working optimally with each of the repositories, but it made the abstraction layer's code a mess and still could not keep the repositories themselves completely clean of extra code.

Yshayy · 2020-06-15T11:30:15Z

Is it possible to implement the policy resource abstraction in a layer above the ResourceRepository? I think it adds complexity to something that should be "stupid" storage.

I generally agree, it was that way initially, but after discussing it with Aviv we decided to make the change for practical reasons.
There were some complications around the saving of the in-memory copy and different underlying repository APIs, and we saw that Aviv had the same issue while implementing the resource group itself, and decided to put it inside the resource repository to avoid those issues. We decided to take the same approach here.

For example, S3 supports listing files along with the details for each file (notably LastModified), while FS only supports listing files, and then the extra details have to be requested for each file individually.
Since the two repositories implement the ResourceRepository interface, from the external abstraction later we would need to work with both using the same functions (that are in the interface). The problem is that each of these underlying repositories has a different way of implementing this logic with optimized performance.

It makes sense that the repository itself would be aware of the implementation with the best performance rather than the abstraction layer intimately knowing the internals of each of the repositories.
I did attempt the approach of the abstraction layer knowing the internals and working optimally with each of the repositories, but it made the abstraction layer's code a mess and still could not keep the repositories themselves completely clean of extra code.

Since this component is not performance-critical (it's control layer), I think it's better to use straight forward storage abstractions (for example, in the fs example, in general, the cost of these reads should be really low). Be glad to discuss it more.

In general, I think that:

export interface ResourceRepository {
    fetchLatest(): Promise<FetchLatestResult>;
    getResourceGroup(): ResourceGroup;
    update(rg: ResourceGroup): Promise<void>;
    writePolicyAttachment(filename: string, content: Buffer): Promise<void>;
    getPolicyAttachment(filename: string): Buffer;
    initializePolicyAttachments(): Promise<void>;
}

Is using higher-level abstraction than storage, there shouldn't be an implementation of fs/s3 for these methods, they should implement lower-level abstraction. Even from OOP perspective, this interface has too many reasons to change, and it behaves more like a header interface rather than a role. Since we already have two implementations, I think it's possible to extract the shared code from both implementations to make sure we use the right abstractions.

Also, the original D2C service had support for loading files and CRDs using the same abstractions and it was quite simple.

In projects like gloo/sqoop that use similar concepts of a control plane, storage is usually abstracted away (even if there aren't many implementations).
Although in Envoy I think the contract is more API driven than storage driven.

tomeresk · 2020-06-15T11:47:21Z

Is it possible to implement the policy resource abstraction in a layer above the ResourceRepository? I think it adds complexity to something that should be "stupid" storage.

I generally agree, it was that way initially, but after discussing it with Aviv we decided to make the change for practical reasons.
There were some complications around the saving of the in-memory copy and different underlying repository APIs, and we saw that Aviv had the same issue while implementing the resource group itself, and decided to put it inside the resource repository to avoid those issues. We decided to take the same approach here.
For example, S3 supports listing files along with the details for each file (notably LastModified), while FS only supports listing files, and then the extra details have to be requested for each file individually.
Since the two repositories implement the ResourceRepository interface, from the external abstraction later we would need to work with both using the same functions (that are in the interface). The problem is that each of these underlying repositories has a different way of implementing this logic with optimized performance.
It makes sense that the repository itself would be aware of the implementation with the best performance rather than the abstraction layer intimately knowing the internals of each of the repositories.
I did attempt the approach of the abstraction layer knowing the internals and working optimally with each of the repositories, but it made the abstraction layer's code a mess and still could not keep the repositories themselves completely clean of extra code.

Since this component is not performance-critical (it's control layer), I think it's better to use straight forward storage abstractions. Be glad to discuss it more.

In general, I think that:
export interface ResourceRepository {
    fetchLatest(): Promise<FetchLatestResult>;
    getResourceGroup(): ResourceGroup;
    update(rg: ResourceGroup): Promise<void>;
    writePolicyAttachment(filename: string, content: Buffer): Promise<void>;
    getPolicyAttachment(filename: string): Buffer;
    initializePolicyAttachments(): Promise<void>;
}
Is using higher-level abstraction than storage, there shouldn't be an implementation of fs/s3 for these methods, they should implement lower-level abstraction. Even from OOP perspective, this interface has too many reasons to change, and it behaves more like a header interface rather than a role. Since we already have two implementations, I think it's possible to extract the shared code from both implementations to make sure we use the right abstractions.

Also, the original D2C service had support for loading files and CRDs using the same abstractions and it was quite simple.

The problematic part is actually in the gateway, so performance does matter. It happens continuously in the background and not in a specific request, so the performance is not critical, but it should be reasonably good.
I don't think re-downloading all of the policies every 2 minutes would be acceptable performance here (even if it happens in the background, it would just waste a lot of resources and network costs), so we do need to keep at least some of the optimizations.

That said, I think we can probably do better, likely even with having basic abstractions over each repository type and then an abstraction over it to manage the resources.
I think this is best done as a separate task that will generally overhaul the work with the repositories (since the same issue exists with the resource group as well).
It should be relatively straightforward to change the consumers of this interface so most of the work would be the new repository implementations.
I'll open an issue about it

Yshayy · 2020-06-15T11:55:06Z

We can open issue :), it's definitely not a blocker.
In general - I don't think that in high-throughput proxy, downloading (or reading from fs) several kbs every 2 minutes should have any meaningful performance/cost impact (I would be more aware at the cost of parsing if this data is large/complex because of node).
I'm pretty sure that first-generation ingress solutions in Kubernetes did that and even much worse.

tomeresk · 2020-06-15T12:00:27Z

We can open issue :), it's definitely not a blocker.
In general - I don't think that in high-throughput proxy, downloading (or reading from fs) several kbs every 2 minutes should have any meaningful impact (I would be more aware at the cost of parsing if these data is large/complex because of node).
I'm pretty sure that first-generation ingress solutions in Kubernetes did that and even much worse.

There is actually no parsing involved (except the resource groups, but that is only one file), it's WASM files so we keep the data as it is read in a Buffer (and provide it to OPA this way).
The problem is not really the size, but the fact that we have to download each file individually with many S3 requests.
Realistically even that won't matter for at least a couple of years, but by the time it becomes an issue it might be harder to fix (or even find out that this is causing the issues). The effort to avoid it didn't seem that big.

AvivRubys · 2020-06-15T12:19:58Z

Correct me if I'm wrong, and I might be, but it seems like the authorization subsytem basically doesn't work the same as the rest of the system, when it comes to resource updates.
The rest of the system is driven by one resource group - a schema is created for each new one, along with the appropriate resolvers, fields, etc. and the old ones are garbage collected.
OTOH, the authorization subsystem, mainly through PolicyExecutor but also in the sense it is refreshed automatically, just gets the reference to the resource repository, and gets the resource group/attachments from it at runtime, basically side-stepping the mechanism of being driven by one resource group at a time.
What's the reasoning behind this?

tomeresk · 2020-06-15T13:16:25Z

Correct me if I'm wrong, and I might be, but it seems like the authorization subsytem basically doesn't work the same as the rest of the system, when it comes to resource updates.
The rest of the system is driven by one resource group - a schema is created for each new one, along with the appropriate resolvers, fields, etc. and the old ones are garbage collected.
OTOH, the authorization subsystem, mainly through PolicyExecutor but also in the sense it is refreshed automatically, just gets the reference to the resource repository, and gets the resource group/attachments from it at runtime, basically side-stepping the mechanism of being driven by one resource group at a time.
What's the reasoning behind this?

I saw what you mention being used for updating the graphql server with schema changes, however the graphql server is not directly using the Policies (like it is using the schema) or their attachments. Why would we want to apply this same logic to them?

tomeresk · 2020-06-15T14:37:29Z

Correct me if I'm wrong, and I might be, but it seems like the authorization subsytem basically doesn't work the same as the rest of the system, when it comes to resource updates.
The rest of the system is driven by one resource group - a schema is created for each new one, along with the appropriate resolvers, fields, etc. and the old ones are garbage collected.
OTOH, the authorization subsystem, mainly through PolicyExecutor but also in the sense it is refreshed automatically, just gets the reference to the resource repository, and gets the resource group/attachments from it at runtime, basically side-stepping the mechanism of being driven by one resource group at a time.
What's the reasoning behind this?

I saw what you mention being used for updating the graphql server with schema changes, however the graphql server is not directly using the Policies (like it is using the schema) or their attachments. Why would we want to apply this same logic to them?

Discussed this with Aviv, I will make some changes to move this data into the RequestContext instead, and change the policy updates to be tied into the other resource updates.
I will do this in another PR and merge this one now, in order to allow Alex to start his branch from the updated authorization branch

* Authorization - fully implemented registry part (#133) This includes: Create/update policy resource Attachments support for policy resource (with support for writing the attachment to both s3 and fs repositories) Opa policy type implementation, including compiling rego code to wasm and adding that to the policy as an attachment * Authorization gateway - basic features (#138) implemented full flow with basic features Implement local policy attachment caching for all resource repositories * Add policy definitions and attachments to request context, change pol… (#141) * Add policy definitions and attachments to request context, change policy executor to use them from context instead of directly from repo * PR comments * allow jwt in param injection (policy authorization can use it through args) (#144) * Support for policy query (#143) * Policy directive - accept only a single policy (#146) * change policy directive to accept only a single policy * Refactored PolicyExecutor API to only expose static methods Co-authored-by: Tomer Eskenazi <tomeresk@gmail.com>

implemented full flow with basic features Implement local policy attachment caching for all resource repositories

authorization - gateway - Implement local policy attachment caching f…

809d879

…or all resource repositories

tomeresk mentioned this pull request Jun 14, 2020

authorization - gateway - Implement local policy attachment caching f… #134

Closed

tomeresk requested a review from a team June 14, 2020 14:48

Yshayy reviewed Jun 14, 2020

View reviewed changes

AleF83 requested changes Jun 15, 2020

View reviewed changes

AleF83 approved these changes Jun 15, 2020

View reviewed changes

tomeresk mentioned this pull request Jun 15, 2020

Refactor the ResourceRepository related code #139

Open

authorization gateway - implemented full flow with basic features

3474a52

tomeresk force-pushed the authorization-gateway branch from 13de708 to 3474a52 Compare June 15, 2020 14:39

tomeresk merged commit f64ce80 into authorization Jun 15, 2020

tomeresk deleted the authorization-gateway branch June 15, 2020 15:40

AleF83 pushed a commit that referenced this pull request Jul 2, 2020

Authorization gateway - basic features (#138)

37f7e82

implemented full flow with basic features Implement local policy attachment caching for all resource repositories

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Authorization gateway - basic features #138

Authorization gateway - basic features #138

tomeresk commented Jun 14, 2020 •

edited

Loading

Yshayy left a comment •

edited

Loading

AleF83 Jun 15, 2020

tomeresk Jun 15, 2020

AleF83 Jun 15, 2020

tomeresk Jun 15, 2020

AleF83 Jun 15, 2020

tomeresk Jun 15, 2020

AleF83 Jun 15, 2020 •

edited

Loading

tomeresk Jun 15, 2020

AleF83 Jun 15, 2020

tomeresk Jun 15, 2020

AleF83 Jun 15, 2020

AleF83 Jun 15, 2020

tomeresk Jun 15, 2020

tomeresk commented Jun 15, 2020

Yshayy commented Jun 15, 2020 •

edited

Loading

tomeresk commented Jun 15, 2020

Yshayy commented Jun 15, 2020 •

edited

Loading

tomeresk commented Jun 15, 2020

AvivRubys commented Jun 15, 2020

tomeresk commented Jun 15, 2020 •

edited

Loading

tomeresk commented Jun 15, 2020

		policyArgs[policyArgName] = policyArgValue;
		return policyArgs;

	policyArgs[policyArgName] = policyArgValue;
	return policyArgs;
	return { ... policyArgs, [policyArgName]: policyArgValue };

Authorization gateway - basic features #138

Authorization gateway - basic features #138

Conversation

tomeresk commented Jun 14, 2020 • edited Loading

Yshayy left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AleF83 Jun 15, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomeresk commented Jun 15, 2020

Yshayy commented Jun 15, 2020 • edited Loading

tomeresk commented Jun 15, 2020

Yshayy commented Jun 15, 2020 • edited Loading

tomeresk commented Jun 15, 2020

AvivRubys commented Jun 15, 2020

tomeresk commented Jun 15, 2020 • edited Loading

tomeresk commented Jun 15, 2020

tomeresk commented Jun 14, 2020 •

edited

Loading

Yshayy left a comment •

edited

Loading

AleF83 Jun 15, 2020 •

edited

Loading

Yshayy commented Jun 15, 2020 •

edited

Loading

Yshayy commented Jun 15, 2020 •

edited

Loading

tomeresk commented Jun 15, 2020 •

edited

Loading