Skip to content

Commit

Permalink
docs: Update philosophy page to discuss the document model
Browse files Browse the repository at this point in the history
When we refreshed the docs last year we removed the page titled 'how
OPA works' and in the process lost the description of the OPA document
model that explains the base and virtual document concepts. Since then
a bunch of people have referred back to those docs after getting
started.

This change re-introduces the model and provides an explanation of
_why_ the model exists.

This change also replaces the infographic with text containing the
same content as that feels a bit more inline with the rest of the
page.

Fixes #2284

Signed-off-by: Torin Sandall <torinsandall@gmail.com>
  • Loading branch information
tsandall committed Apr 21, 2020
1 parent e21a933 commit 6bf8653
Show file tree
Hide file tree
Showing 5 changed files with 113 additions and 9 deletions.
1 change: 0 additions & 1 deletion docs/content/images/benefits.svg

This file was deleted.

1 change: 1 addition & 0 deletions docs/content/images/data-model.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
112 changes: 108 additions & 4 deletions docs/content/philosophy.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,112 @@ dynamically into OPA remotely via APIs or through the local filesystem.

## Why use OPA?

{{< figure src="benefits.svg" width="65" caption="Open Policy Agent: before and after" >}}

OPA is a full-featured policy engine that offloads policy decisions from your
service. You can think of it as a concierge for your service who can answer
detailed questions on behalf of your users to meet their specific needs.
software. You can think of it as a concierge for your software who can answer
detailed questions on behalf of your users to meet their specific needs.
OPA provides the building blocks for enabling better control and visibility over
policy in your systems.

Without OPA, you need to implement policy management for your software from scratch.
Required components such as the policy language (syntax _and_ semantics) and the
evaluation engine need to be carefully designed, implemented, tested, documented,
and then maintained to ensure correct behaviour and a positive user experience
for your customers. On top of that you must carefully consider security, tooling,
management, and more. That's a lot of work.

## How Does OPA Work?

See the [Introduction](..) for an overview of how OPA works and how to get started.

## The OPA Document Model

OPA policies (written in Rego) make decisions based on hierarchical structured data.
Sometimes we refer to this data as a document, set of attributes, piece of context,
or even just "JSON" [1]. Importantly, OPA policies can make decisions based on _arbitrary_
structured data. OPA itself is not tied to any particular domain model. Similarly,
OPA policies can represent decisions as arbitrary structured data (e.g., booleans,
strings, maps, maps of lists of maps, etc.)

Data can be loaded into OPA from outside world using push or pull interfaces that operate
synchronously or asynchronously with respect to policy evaluation. We refer to all data
loaded into OPA from the outside world as **base documents** [2]. These base documents
almost always contribute to your policy decision-making logic. However, your policies can
also make decisions based on each other. Policies almost always consist of multiple rules
that refer to other rules (possibly authored by different groups). In OPA, we refer
to the values generated by rules (a.k.a., decisions) as **virtual documents**. The term
"virtual" in this case just means the document is _computed_ by the policy, i.e.,
it's not loaded into OPA from the outside world.

Base and virtual documents can represent the exact same kind of information, e.g., numbers,
strings, lists, maps, and so on. Moreover, with Rego, you can refer to both base and virtual
documents using the exact same dot/bracket-style reference syntax. Consistency across the
types of values that can be represented and the way those values are referenced means that
_policy authors only need to learn one way of modeling and referring to information
that drives policy decision-making_. Additionally, since there is no conceptual difference
in the types of values or the way you refer to those values in base and virtual documents,
Rego lets you refer to _both_ base and virtual documents through a global variable
called `data`. Similarly, OPA lets you query for both base and virtual documents via the
`/v1/data` HTTP API [3]. This is why queries for just `data` (or `data.foo` or `data.foo.bar`, etc.)
return the combination of base and virtual documents located under that path.

Since base documents come from outside of OPA, their location under `data` is controlled
by the software doing the loading. On the other hand, the location of virtual
documents under `data` is controlled by policies themselves using the `package` directive
in the language.

Base documents can be pushed or pulled into OPA _asynchronously_ by replicating data
into OPA when the state of the world changes. This can happen periodically or when some
event (like a database change notification) occurs. Base documents loaded asynchronously
are always accessed under the `data` global variable. On the other hand, base documents can
also be pushed or pulled into OPA _synchronously_ when your software queries OPA for policy
decisions. We say refer to base documents pushed synchronously as "input". Policies can
access these inputs under the `input` global variable. To pull base documents during
policy evaluation, OPA exposes (and can be extended with custom) built-in functions like
`http.send`. Built-in function return values can be assigned to local variables and
surfaced in virtual documents. Data loaded synchronously is kept outside of `data` to
avoid naming conflicts.

The following table summarizes the different models for loading base documents into OPA,
how they can referenced inside of policies, and the actual mechanism(s) for loading.

| Model | How to access in Rego | How to integrate with OPA |
| --- | --- | --- |
| Asynchronous Push | The `data` global variable | Invoke OPA's API(s), e.g., `PUT /v1/data` |
| Asynchronous Pull | The `data` global variable | Configure OPA's [Bundle](../management#bundles) feature |
| Synchronous Push | The `input` global variable | Provide data in policy query, e.g., inside the body of `POST /v1/data` |
| Synchronous Pull | The [built-in functions](../policy-reference), e.g., `http.send` | N/A |

Data loaded asynchronously into OPA is cached in-memory so that it can be read efficiently
during policy evaluation. Similarly, policies are also cached in-memory to ensure
high-performance and and high-availability. Data _pulled_ synchronously can also be
cached in-memory. For more information on loading external data into OPA, including tradeoffs,
see the [External Data](../external-data) page.

The following diagram illustrates the base and virtual document model described above for a
hypothetical policy that renders authorization decisions (named `data.acme.allow`) based on:

* API request information pushed synchronously located under `input`.
* Entitlements data pulled asynchronously and located under `date.entitlements`.
* Resource data pulled synchronously during policy evaluation using the `http.send` built-in function.

The entitlements and resource information is _abstracted_ by rules that generate
virtual documents named `data.iam.user_has_role` and `data.acme.user_is_assigned` respectively.

<!--- source: https://docs.google.com/drawings/d/1KerjlOGRmsZvs2tqfhLh2CGGkNRFH0GWioBsHLHAuIg/edit --->

{{< figure src="data-model.svg" width="65" caption="Hypothetical Policy Document Model" >}}

> [1] OPA has excellent support for loading JSON and YAML because they are prevalent
> in modern systems however OPA is not tied to any particular data format. OPA
> uses it's own internal representation for structures like maps and lists (a.k.a.,
> objects and arrays in JSON.)
> [2] The term "document" comes from the document-oriented database world. Document
> is just a generic term to refer to data or information encoded in some standard
> format like JSON, YAML, XML, etc. Document-oriented data does not have to adhere
> to a strict schema like data in the relational world. Documents are often deeply
> nested, hierarchical data structures containing several levels of embedded
> maps and lists.
> [3] Internally HTTP requests like `GET /v1/data` or `GET /v1/data/foo/bar` are turned
> into Rego queries that are almost identical to the HTTP path (e.g., `data` or `data.foo.bar`)
6 changes: 3 additions & 3 deletions docs/content/policy-language.md
Original file line number Diff line number Diff line change
Expand Up @@ -659,7 +659,7 @@ b

## Rules

Rules define the content of [Virtual Documents](../#rules-and-virtual-documents) in
Rules define the content of [Virtual Documents](../philosophy#how-does-opa-work) in
OPA. When OPA evaluates a rule, we say OPA *generates* the content of the
document that is defined by the rule.

Expand Down Expand Up @@ -956,7 +956,7 @@ s(5, 3)

## Negation

To generate the content of a [Virtual Document](../#rules-and-virtual-documents), OPA attempts to bind variables in the body of the rule such that all expressions in the rule evaluate to True.
To generate the content of a [Virtual Document](../philosophy#how-does-opa-work), OPA attempts to bind variables in the body of the rule such that all expressions in the rule evaluate to True.

This generates the correct result when the expressions represent assertions about what states should exist in the data stored in OPA. In some cases, you want to express that certain states *should not* exist in the data stored in OPA. In these cases, negation must be used.

Expand Down Expand Up @@ -1151,7 +1151,7 @@ Import statements declare dependencies that modules have on documents defined ou

All modules contain implicit statements which import the `data` and `input` documents.

Modules use the same syntax to declare dependencies on [Base Documents](../#base-documents) and [Virtual Documents](../#rules-and-virtual-documents).
Modules use the same syntax to declare dependencies on [Base and Virtual Documents](../philosophy#how-does-opa-work).

```live:import_data:module:read_only
package opa.examples
Expand Down
2 changes: 1 addition & 1 deletion docs/content/rest-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -710,7 +710,7 @@ Content-Type: application/json

## Data API

The Data API exposes endpoints for reading and writing documents in OPA. For an introduction to the different types of documents in OPA see [How Does OPA Work?](../#how-does-opa-work).
The Data API exposes endpoints for reading and writing documents in OPA. For an explanation to the different types of documents in OPA see [How Does OPA Work?](../philosophy#how-does-opa-work)

### Get a Document

Expand Down

0 comments on commit 6bf8653

Please sign in to comment.