docs: Update philosophy page to discuss the document model

When we refreshed the docs last year we removed the page titled 'how OPA works' and in the process lost the description of the OPA document model that explains the base and virtual document concepts. Since then a bunch of people have referred back to those docs after getting started. This change re-introduces the model and provides an explanation of _why_ the model exists. This change also replaces the infographic with text containing the same content as that feels a bit more inline with the rest of the page. Fixes #2284 Signed-off-by: Torin Sandall <torinsandall@gmail.com>
open-policy-agent · Apr 21, 2020 · 6bf8653 · 6bf8653
1 parent e21a933
commit 6bf8653
Show file tree

Hide file tree

Showing 5 changed files with 113 additions and 9 deletions.
diff --git a/docs/content/images/benefits.svg b/docs/content/images/benefits.svg
diff --git a/docs/content/images/data-model.svg b/docs/content/images/data-model.svg
diff --git a/docs/content/philosophy.md b/docs/content/philosophy.md
@@ -74,8 +74,112 @@ dynamically into OPA remotely via APIs or through the local filesystem.
 
 ## Why use OPA?
 
-{{< figure src="benefits.svg" width="65" caption="Open Policy Agent: before and after" >}}
-
 OPA is a full-featured policy engine that offloads policy decisions from your
-service. You can think of it as a concierge for your service who can answer
-detailed questions on behalf of your users to meet their specific needs.
+software. You can think of it as a concierge for your software who can answer
+detailed questions on behalf of your users to meet their specific needs.
+OPA provides the building blocks for enabling better control and visibility over
+policy in your systems.
+
+Without OPA, you need to implement policy management for your software from scratch.
+Required components such as the policy language (syntax _and_ semantics) and the
+evaluation engine need to be carefully designed, implemented, tested, documented,
+and then maintained to ensure correct behaviour and a positive user experience
+for your customers. On top of that you must carefully consider security, tooling,
+management, and more. That's a lot of work.
+
+## How Does OPA Work?
+
+See the [Introduction](..) for an overview of how OPA works and how to get started.
+
+## The OPA Document Model
+
+OPA policies (written in Rego) make decisions based on hierarchical structured data.
+Sometimes we refer to this data as a document, set of attributes, piece of context,
+or even just "JSON" [1]. Importantly, OPA policies can make decisions based on _arbitrary_
+structured data. OPA itself is not tied to any particular domain model. Similarly,
+OPA policies can represent decisions as arbitrary structured data (e.g., booleans,
+strings, maps, maps of lists of maps, etc.)
+
+Data can be loaded into OPA from outside world using push or pull interfaces that operate
+synchronously or asynchronously with respect to policy evaluation. We refer to all data
+loaded into OPA from the outside world as **base documents** [2]. These base documents
+almost always contribute to your policy decision-making logic. However, your policies can
+also make decisions based on each other. Policies almost always consist of multiple rules
+that refer to other rules (possibly authored by different groups). In OPA, we refer
+to the values generated by rules (a.k.a., decisions) as **virtual documents**. The term
+"virtual" in this case just means the document is _computed_ by the policy, i.e.,
+it's not loaded into OPA from the outside world.
+
+Base and virtual documents can represent the exact same kind of information, e.g., numbers,
+strings, lists, maps, and so on. Moreover, with Rego, you can refer to both base and virtual
+documents using the exact same dot/bracket-style reference syntax. Consistency across the
+types of values that can be represented and the way those values are referenced means that
+_policy authors only need to learn one way of modeling and referring to information
+that drives policy decision-making_. Additionally, since there is no conceptual difference
+in the types of values or the way you refer to those values in base and virtual documents,
+Rego lets you refer to _both_ base and virtual documents through a global variable
+called `data`. Similarly, OPA lets you query for both base and virtual documents via the
+`/v1/data` HTTP API [3]. This is why queries for just `data` (or `data.foo` or `data.foo.bar`, etc.)
+return the combination of base and virtual documents located under that path.
+
+Since base documents come from outside of OPA, their location under `data` is controlled
+by the software doing the loading. On the other hand, the location of virtual
+documents under `data` is controlled by policies themselves using the `package` directive
+in the language.
+
+Base documents can be pushed or pulled into OPA _asynchronously_ by replicating data
+into OPA when the state of the world changes. This can happen periodically or when some
+event (like a database change notification) occurs. Base documents loaded asynchronously
+are always accessed under the `data` global variable. On the other hand, base documents can
+also be pushed or pulled into OPA _synchronously_ when your software queries OPA for policy
+decisions. We say refer to base documents pushed synchronously as "input". Policies can
+access these inputs under the `input` global variable. To pull base documents during
+policy evaluation, OPA exposes (and can be extended with custom) built-in functions like
+`http.send`. Built-in function return values can be assigned to local variables and
+surfaced in virtual documents. Data loaded synchronously is kept outside of `data` to
+avoid naming conflicts.
+
+The following table summarizes the different models for loading base documents into OPA,
+how they can referenced inside of policies, and the actual mechanism(s) for loading.
+
+| Model | How to access in Rego | How to integrate with OPA |
+| --- | --- | --- |
+| Asynchronous Push | The `data` global variable | Invoke OPA's API(s), e.g., `PUT /v1/data` |
+| Asynchronous Pull | The `data` global variable | Configure OPA's [Bundle](../management#bundles) feature |
+| Synchronous Push | The `input` global variable | Provide data in policy query, e.g., inside the body of `POST /v1/data` |
+| Synchronous Pull | The [built-in functions](../policy-reference), e.g., `http.send` | N/A |
+
+Data loaded asynchronously into OPA is cached in-memory so that it can be read efficiently
+during policy evaluation. Similarly, policies are also cached in-memory to ensure
+high-performance and and high-availability. Data _pulled_ synchronously can also be
+cached in-memory. For more information on loading external data into OPA, including tradeoffs,
+see the [External Data](../external-data) page.
+
+The following diagram illustrates the base and virtual document model described above for a
+hypothetical policy that renders authorization decisions (named `data.acme.allow`) based on:
+
+* API request information pushed synchronously located under `input`.
+* Entitlements data pulled asynchronously and located under `date.entitlements`.
+* Resource data pulled synchronously during policy evaluation using the `http.send` built-in function.
+
+The entitlements and resource information is _abstracted_ by rules that generate
+virtual documents named `data.iam.user_has_role` and `data.acme.user_is_assigned` respectively.
+
+<!--- source: https://docs.google.com/drawings/d/1KerjlOGRmsZvs2tqfhLh2CGGkNRFH0GWioBsHLHAuIg/edit --->
+
+{{< figure src="data-model.svg" width="65" caption="Hypothetical Policy Document Model" >}}
+
+> [1] OPA has excellent support for loading JSON and YAML because they are prevalent
+> in modern systems however OPA is not tied to any particular data format. OPA
+> uses it's own internal representation for structures like maps and lists (a.k.a.,
+> objects and arrays in JSON.)
+
+> [2] The term "document" comes from the document-oriented database world. Document
+> is just a generic term to refer to data or information encoded in some standard
+> format like JSON, YAML, XML, etc. Document-oriented data does not have to adhere
+> to a strict schema like data in the relational world. Documents are often deeply
+> nested, hierarchical data structures containing several levels of embedded
+> maps and lists.
+
+> [3] Internally HTTP requests like `GET /v1/data` or `GET /v1/data/foo/bar` are turned
+> into Rego queries that are almost identical to the HTTP path (e.g., `data` or `data.foo.bar`)
diff --git a/docs/content/policy-language.md b/docs/content/policy-language.md
@@ -659,7 +659,7 @@ b
 
 ## Rules
 
-Rules define the content of [Virtual Documents](../#rules-and-virtual-documents) in
+Rules define the content of [Virtual Documents](../philosophy#how-does-opa-work) in
 OPA. When OPA evaluates a rule, we say OPA *generates* the content of the
 document that is defined by the rule.
 
@@ -956,7 +956,7 @@ s(5, 3)
 
 ## Negation
 
-To generate the content of a [Virtual Document](../#rules-and-virtual-documents), OPA attempts to bind variables in the body of the rule such that all expressions in the rule evaluate to True.
+To generate the content of a [Virtual Document](../philosophy#how-does-opa-work), OPA attempts to bind variables in the body of the rule such that all expressions in the rule evaluate to True.
 
 This generates the correct result when the expressions represent assertions about what states should exist in the data stored in OPA. In some cases, you want to express that certain states *should not* exist in the data stored in OPA. In these cases, negation must be used.
 
@@ -1151,7 +1151,7 @@ Import statements declare dependencies that modules have on documents defined ou
 
 All modules contain implicit statements which import the `data` and `input` documents.
 
-Modules use the same syntax to declare dependencies on [Base Documents](../#base-documents) and [Virtual Documents](../#rules-and-virtual-documents).
+Modules use the same syntax to declare dependencies on [Base and Virtual Documents](../philosophy#how-does-opa-work).
 
 ```live:import_data:module:read_only
 package opa.examples

diff --git a/docs/content/rest-api.md b/docs/content/rest-api.md
@@ -710,7 +710,7 @@ Content-Type: application/json
 
 ## Data API
 
-The Data API exposes endpoints for reading and writing documents in OPA. For an introduction to the different types of documents in OPA see [How Does OPA Work?](../#how-does-opa-work).
+The Data API exposes endpoints for reading and writing documents in OPA. For an explanation to the different types of documents in OPA see [How Does OPA Work?](../philosophy#how-does-opa-work)
 
 ### Get a Document