Skip to content

5. Entity Structure

Paula Gearon edited this page Jun 30, 2021 · 1 revision

Overview

Asami allow Clojure objects to be written to the graph and retrieved. These objects are referred to as Entities, to align with a similar concept in Datomic.

This page discusses how these data structures are written to the database, so that users can understand how to query and modify them, and how to work within any limitations of the design.

Entities may also have more complicated structures, containing nested elements. These include other objects or arrays. Arrays can contain scalar values, such as numbers and strings, but can also contain other nested elements, such as objects and arrays.

The original purpose of Asami's entity implementation was to store and retrieve JSON documents, and some of the design decisions in Asami reflect this. Elizabeth and Fitzwilliam

Basic Structure

Asami breaks entities up into datoms (tuples containing an entity/attribute/value) which are then stored in the graph.

Each entity object is allocated a node (unless one is provided) which is the value used in the first position of each datom. The attributes and values of the entity object are then used to fill in the other two positions. Finally, a couple of extra pieces of system information are added as extra datoms to help with the management of entities in the graph.

Consider a small object to be converted into tuples:

{:name "Fitzwilliam" :home "Pemberley"}

The first step it to allocate a node for the object, for instance :tg/node-10499. The resulting tuples are:

:tg/node-14842 :name "Fitzwilliam"
:tg/node-14842 :home "Pemberley"

The graph for this is very small.

Simple 3-node graph

Since an identifier was not provided, the system will also add a statement making the node its own identifier. Finally, a property called :tg/entity is added to indicate that the node is an entity. This makes it easy to distinguish entities from other nodes in the database.

We can see all of this in action if we add the entity to a database and read back the datoms that were inserted, by inspecting the :tx-data field from the result (see transact for details):

;; initialize the database
(require '[asami.core :as d])
(def db-uri "asami:mem://entity-data")
(d/create-database db-uri)
(def conn (d/connect db-uri))

;; load the data
(def data [{:name "William" :home "Pemberley"}])
(def tx (d/transact conn {:tx-data data}))
(pprint (:tx-data @tx))

The allocated node may differ, but the results should appear like this:

(#datom [:tg/node-14842 :name "William" 1 true]
 #datom [:tg/node-14842 :home "Pemberley" 1 true]
 #datom [:tg/node-14842 :db/ident :tg/node-14842 1 true]
 #datom [:tg/node-14842 :tg/entity true 1 true])

The graph of this full structure is then:

Fitzwilliam entity graph

Identifiers

:db/ident

Alternatively, a value may be specifically passed for :db/ident:

(def data [{:db/ident "lizzy" :name "Elizabeth" :home "Longbourn"}])
(def tx (d/transact conn {:tx-data data}))
(pprint (:tx-data @tx))

This is the second transaction, so the transaction ID on the resulting datoms has incremented:

(#datom [:tg/node-14845 :db/ident "lizzy" 2 true]
 #datom [:tg/node-14845 :name "Elizabeth" 2 true]
 #datom [:tg/node-14845 :home "Longbourn" 2 true]
 #datom [:tg/node-14845 :tg/entity true 2 true])

:db/id

The node to use can also be set by using the :db/id attribute.

(def data [{:db/id :tg/node-mynode :name "Jane" :home "Longbourn"}])
(def tx (d/transact conn {:tx-data data}))
(pprint (:tx-data @tx))

The resulting datoms use the specified node:

(#datom [:tg/node-mynode :db/ident :tg/node-mynode 3 true]
 #datom [:tg/node-mynode :tg/entity true 3 true]
 #datom [:tg/node-mynode :name "Jane" 3 true]
 #datom [:tg/node-mynode :home "Longbourn" 3 true])

Nested Objects

Objects can be nested in an entity, and are encoded in nearly the same way as the entity:

(def data [{:db/ident "catherine"
            :name "Catherine"
            :home {:name "Rosings Park"
                   :village "Rosings"
                   :town "Hunsford"
                   :county "Kent"}}])
(def tx (d/transact conn {:tx-data data}))
(pprint (:tx-data @tx))

The value of the :home attribute is an object will be assigned its own node. The example here, the node assigned for the sub-object is :tg/node-14851.

(#datom [:tg/node-14850 :db/ident "catherine" 4 true]
 #datom [:tg/node-14850 :name "Catherine" 4 true]
 #datom [:tg/node-14851 :name "Rosings Park" 4 true]
 #datom [:tg/node-14851 :village "Rosings" 4 true]
 #datom [:tg/node-14851 :town "Hunsford" 4 true]
 #datom [:tg/node-14851 :county "Kent" 4 true]
 #datom [:tg/node-14850 :tg/owns :tg/node-14851 4 true]
 #datom [:tg/node-14850 :home :tg/node-14851 4 true]
 #datom [:tg/node-14850 :tg/entity true 4 true])

Nested entity graph

Unlike the top-level entity, the sub-object does not have the :tg/entity property. The sub-object is also "owned" by a top-level entity. This makes it easy to identify all objects associated with an entity. Otherwise, sub-objects are encoded identically to top-level entities.

Nested References

Nested objects can be given identifiers like any other object. This is particularly useful for later reference:

(def data [{:db/ident "charles"
            :name "Charles"
            :home {:db/ident "scarborough"
                   :town "Scarborough"
                   :county "Yorkshire"}}
           {:db/ident "jane"
            :name "Jane"
            :home {:db/ident "scarborough"}}])
(def tx (d/transact conn {:tx-data data}))
(pprint (:tx-data @tx))

Jane's home is not defining an entire object, but instead it references an existing object (a :db/id temporary property can also be used).

The result creates node :tg/node-14855 for Charles's home, and then re-uses this node for Jane:

(#datom [:tg/node-14854 :db/ident "charles" 5 true]
 #datom [:tg/node-14854 :name "Charles" 5 true]
 #datom [:tg/node-14855 :db/ident "scarborough" 5 true]
 #datom [:tg/node-14855 :town "Scarborough" 5 true]
 #datom [:tg/node-14855 :county "Yorkshire" 5 true]
 #datom [:tg/node-14854 :tg/owns :tg/node-14855 5 true]
 #datom [:tg/node-14854 :home :tg/node-14855 5 true]
 #datom [:tg/node-14854 :tg/entity true 5 true]
 #datom [:tg/node-14856 :db/ident "jane" 5 true]
 #datom [:tg/node-14856 :name "Jane" 5 true]
 #datom [:tg/node-14856 :tg/owns :tg/node-14855 5 true]
 #datom [:tg/node-14856 :home :tg/node-14855 5 true]
 #datom [:tg/node-14856 :tg/entity true 5 true])

Entities with shared sub-object

We can see these that this data is shared by both entities when we try to read them:

=> (d/entity (d/db conn) "charles")
{:name "Charles", :home {:town "Scarborough", :county "Yorkshire"}}
=> (d/entity (d/db conn) "jane")
{:name "Jane", :home {:town "Scarborough", :county "Yorkshire"}}

Also, by providing a reference, we can ask for the object directly, despite it not being a top-level entity:

=> (d/entity (d/db conn) "scarborough")
{:town "Scarborough", :county "Yorkshire"}

Note that both the "jane" and "charles" nodes are connected to :tg/node-14855 home node with :tg/owns. This dual ownership allows Asami to track that both objects reference this node. Typically, if a node like :tg/node14856 were to be deleted, then every node that it owns will also be deleted. However, dual ownership like this should ensure that the :tg/node-14855 node is left alone.

Top Level Entity References

Top-level entities can also be referenced inside an object. However, because these objects are considered special, they are not retrieved in entities, and instead a reference is returned.

(def data [{:db/ident "anne"
            :name "Anne"
            :sister {:db/ident "catherine"}}])
(def tx (d/transact conn {:tx-data data}))
(pprint (:tx-data @tx))

This refers to Anne's sister Catherine (defined earlier) by directly referencing her node, in the same way that Jane and Charles referred to the same home:

(#datom [:tg/node-14865 :db/ident "anne" 6 true]
 #datom [:tg/node-14865 :name "Anne" 6 true]
 #datom [:tg/node-14865 :sister :tg/node-14850 6 true]
 #datom [:tg/node-14865 :tg/entity true 6 true])

Graph showing Anne referencing her sister Catherine

However, because Catherine is a top-level entity, when loading Anne's details we don't see the top-level entity nested. Instead, we will get a reference to that entity:

=> (d/entity (d/db conn) "anne")
{:name "Anne", :sister #:db{:ident "catherine"}}

This behavior can be overridden by providing an optional nested? flag to the call to the entity function:

=> (d/entity (d/db conn) "anne" true)
{:name "Anne", :sister {:name "Catherine", :home {:name "Rosings Park", :village "Rosings", :town "Hunsford", :county "Kent"}}}

Structural Loops

Allowing references across the data like this can form loops. These are detected, and an object will not be re-loaded. For instance, in the following definition, Catherine is Mrs Bennet's daughter, and Mrs Bennet is Catherine's mother:

(def data [{:db/ident "family"
            :mother {:db/ident "mbennet" :name "Mrs Bennet" :daughter {:db/ident "kitty"}}
            :child {:db/ident "kitty" :name "Catherine" :parent {:db/ident "mbennet"}}}])
(def tx (d/transact conn {:tx-data data}))
(pprint (:tx-data @tx))
(#datom [:tg/node-14872 :db/ident "family" 7 true]
 #datom [:tg/node-14873 :db/ident "mbennet" 7 true]
 #datom [:tg/node-14873 :name "Mrs Bennet" 7 true]
 #datom [:tg/node-14874 :db/ident "kitty" 7 true]
 #datom [:tg/node-14872 :tg/owns :tg/node-14874 7 true]
 #datom [:tg/node-14873 :daughter :tg/node-14874 7 true]
 #datom [:tg/node-14872 :tg/owns :tg/node-14873 7 true]
 #datom [:tg/node-14872 :mother :tg/node-14873 7 true]
 #datom [:tg/node-14874 :name "Catherine" 7 true]
 #datom [:tg/node-14874 :parent :tg/node-14873 7 true]
 #datom [:tg/node-14872 :tg/owns :tg/node-14874 7 true]
 #datom [:tg/node-14872 :child :tg/node-14874 7 true]
 #datom [:tg/node-14872 :tg/entity true 7 true])

Loop relationship between Catherine and Mrs Bennet

This forms a loop between Mrs Bennet at :tg/node-10522 and her daughter Catherine at :tg/node-10523. Without loop detection, trying to load either of these entities would loop forever. Loops can terminate at slightly different places, depending on where an object is encountered. We can see this depending on whether we load the entity for Mrs Bennet, Catherine, or the family:

=> (d/entity (d/db conn) "family")
{:mother {:name "Mrs Bennet", :daughter {:name "Catherine"}}, :child {:name "Catherine", :mother {:name "Mrs Bennet"}}}
=> (d/entity (d/db conn) "mbennet")
{:name "Mrs Bennet", :daughter {:name "Catherine", :mother {:name "Mrs Bennet"}}}
=> (d/entity (d/db conn) "kitty")
{:name "Catherine", :mother {:name "Mrs Bennet", :daughter {:name "Catherine"}}}

Arrays

The other kind of structured data in an entity is an array. These are important both for storing multiple items for a single attribute, and for ordering items in a series.

While a graph can encode multiple values for a single attribute, these have the semantics of a set, rather than an array. There are 2 ways in which this does not match the functionality of an array:

  • Order is not preserved.
  • Multiple items are lost (not true if a multi-graph is in use).

Asami avoids these issues by storing an array as a linked list. However, this can make searching the array difficult in a standard query, so extra statements are also added to make querying easier.

Creating arrays is discussed in the Transactions page, while appending to arrays appears in the Append Annotation section.

Simple Array

When an array is encoded, a node is created to represent the list for that array. This node forms the head of the linked list. It then contains an attribute of :tg/rest which refers to the next node in the list. Nodes are daisy-chained together, one for each element of the list. The first node without a :tg/rest attribute is the end of the list, often called the list's tail.

Nodes forming a Linked List

The values for this list are then linked to each node by the :tg/first attribute. For instance, if this list contained the strings ["one", "two", "three", "four"] then it would appear as follows:

[:tg/node-14884 :tg/first "one"]
[:tg/node-14884 :tg/rest :tg/node-14885]
[:tg/node-14885 :tg/first "two"]
[:tg/node-14885 :tg/rest :tg/node-14886]
[:tg/node-14886 :tg/first "three"]
[:tg/node-14886 :tg/rest :tg/node-14887]
[:tg/node-148867 :tg/first "four"]

Linked List with strings

Containership

Asami also adds extra statements to make it easier to find all of the elements in a list while querying. The first node that represents the list is linked to each value in the list by the :tg/contains attribute.

The final structure looks like this:

[:tg/node-14884 :tg/first "one"]
[:tg/node-14884 :tg/rest :tg/node-14885]
[:tg/node-14885 :tg/first "two"]
[:tg/node-14885 :tg/rest :tg/node-14886]
[:tg/node-14886 :tg/first "three"]
[:tg/node-14886 :tg/rest :tg/node-14887]
[:tg/node-14887 :tg/first "four"]
[:tg/node-14884 :tg/contains "one"]
[:tg/node-14884 :tg/contains "two"]
[:tg/node-14884 :tg/contains "three"]
[:tg/node-14884 :tg/contains "four"]

Complete Linked List structure

A Note about Zuko

All of the functionality of encoding and decoding Entities is performed via an external library called Zuko. Along with Asami, Zuko was initially developed as a part of the Naga rules engine. Zuko is split out independently from Asami to provide functionality that Asami needs, but that is also needed for other applications that don't need the Asami database.

The structures described on this page form part of the documentation to Zuko. This document also applies to integrations with other databases with one important exception.

Some databases, such as Datomic, require that all attributes refer only to a specific data type. This makes it impossible for a single attributes like :tg/first or :tg/contains to refer to any arbitrary type of array entry. If the array contains numbers, then a different pair of attributes will be needed. Similarly, if the array contains other entities, a pair of attributes that can refer to entities will be needed. This extends to requiring a pair of properties for every datatype supported by the database.

Naga's Datomic integration declares the following attributes for Zuko to use:

Datatype Linked list property Containership property
entity :tg/first :tg/contains
string :tg/first-s :tg/contains-s
boolean :tg/first-b :tg/contains-b
long :tg/first-l :tg/contains-l
bigint :tg/first-bi :tg/contains-bi
float :tg/first-f :tg/contains-f
double :tg/first-d :tg/contains-d
bigdec :tg/first-bd :tg/contains-bd
instant :tg/first-dt :tg/contains-dt
uuid :tg/first-uu :tg/contains-uu
uri :tg/first-u :tg/contains-u

Structures With Arrays

Arrays are not encoded directly, but only as a nested structure within an entity.

The array described above comes from:

(def data [{:name "numbers" :values ["one" "two" "three" "four"]}])
(def tx (d/transact conn {:tx-data data}))
(pprint (:tx-data @tx))
(#datom [:tg/node-14883 :name "numbers" 8 true]
 #datom [:tg/node-14887 :tg/first "four" 8 true]
 #datom [:tg/node-14886 :tg/first "three" 8 true]
 #datom [:tg/node-14886 :tg/rest :tg/node-14887 8 true]
 #datom [:tg/node-14885 :tg/first "two" 8 true]
 #datom [:tg/node-14885 :tg/rest :tg/node-14886 8 true]
 #datom [:tg/node-14884 :tg/first "one" 8 true]
 #datom [:tg/node-14884 :tg/rest :tg/node-14885 8 true]
 #datom [:tg/node-14884 :tg/contains "one" 8 true]
 #datom [:tg/node-14884 :tg/contains "two" 8 true]
 #datom [:tg/node-14884 :tg/contains "three" 8 true]
 #datom [:tg/node-14884 :tg/contains "four" 8 true]
 #datom [:tg/node-14883 :tg/owns :tg/node-14884 8 true]
 #datom [:tg/node-14883 :values :tg/node-14884 8 true]
 #datom [:tg/node-14883 :db/ident :tg/node-14883 8 true]
 #datom [:tg/node-14883 :tg/entity true 8 true])

Entity with nested array

Structures and Queries

All of the above can be nested arbitrarily, with objects and arrays being valid nested structures within any other object.

The following is a set of objects describing the children in 3 separate families, the Bennets, the Bingleys, and the Fitzwilliams:

(def data [{:db/ident "bennet"
            :type "family"
            :name "Bennet"
            :children [{:name "Jane"}
                       {:name "Elizabeth"}
                       {:name "Mary"}
                       {:name "Catherine"}
                       {:name "Lydia"}]}
           {:db/ident "bingley"
            :type "family"
            :name "Bingley"
            :children [{:name "Charles"}
                       {:name "Caroline"}
                       {:name "Louisa" :surname "Hurst"}]}
           {:db/ident "fitzwilliam"
            :type "family"
            :name "Fitzwilliam"
            :children [{:name "Catherine" :surname "de Bourgh"}
                       {:name "Anne" :surname "Darcy"}]}])
(def tx (d/transact conn {:tx-data data}))
(pprint (:tx-data @tx))

The listing is quite long, but included here so that the full structure can be referenced:

(#datom [:tg/node-14890 :db/ident "bennet" 9 true]
 #datom [:tg/node-14890 :type "family" 9 true]
 #datom [:tg/node-14890 :name "Bennet" 9 true]
 #datom [:tg/node-14892 :name "Jane" 9 true]
 #datom [:tg/node-14890 :tg/owns :tg/node-14892 9 true]
 #datom [:tg/node-14894 :name "Elizabeth" 9 true]
 #datom [:tg/node-14890 :tg/owns :tg/node-14894 9 true]
 #datom [:tg/node-14896 :name "Mary" 9 true]
 #datom [:tg/node-14890 :tg/owns :tg/node-14896 9 true]
 #datom [:tg/node-14898 :name "Catherine" 9 true]
 #datom [:tg/node-14890 :tg/owns :tg/node-14898 9 true]
 #datom [:tg/node-14900 :name "Lydia" 9 true]
 #datom [:tg/node-14890 :tg/owns :tg/node-14900 9 true]
 #datom [:tg/node-14899 :tg/first :tg/node-14900 9 true]
 #datom [:tg/node-14897 :tg/first :tg/node-14898 9 true]
 #datom [:tg/node-14897 :tg/rest :tg/node-14899 9 true]
 #datom [:tg/node-14895 :tg/first :tg/node-14896 9 true]
 #datom [:tg/node-14895 :tg/rest :tg/node-14897 9 true]
 #datom [:tg/node-14893 :tg/first :tg/node-14894 9 true]
 #datom [:tg/node-14893 :tg/rest :tg/node-14895 9 true]
 #datom [:tg/node-14891 :tg/first :tg/node-14892 9 true]
 #datom [:tg/node-14891 :tg/rest :tg/node-14893 9 true]
 #datom [:tg/node-14891 :tg/contains :tg/node-14892 9 true]
 #datom [:tg/node-14891 :tg/contains :tg/node-14894 9 true]
 #datom [:tg/node-14891 :tg/contains :tg/node-14896 9 true]
 #datom [:tg/node-14891 :tg/contains :tg/node-14898 9 true]
 #datom [:tg/node-14891 :tg/contains :tg/node-14900 9 true]
 #datom [:tg/node-14890 :tg/owns :tg/node-14891 9 true]
 #datom [:tg/node-14890 :children :tg/node-14891 9 true]
 #datom [:tg/node-14890 :tg/entity true 9 true]
 #datom [:tg/node-14901 :db/ident "bingley" 9 true]
 #datom [:tg/node-14901 :type "family" 9 true]
 #datom [:tg/node-14901 :name "Bingley" 9 true]
 #datom [:tg/node-14903 :name "Charles" 9 true]
 #datom [:tg/node-14901 :tg/owns :tg/node-14903 9 true]
 #datom [:tg/node-14905 :name "Caroline" 9 true]
 #datom [:tg/node-14901 :tg/owns :tg/node-14905 9 true]
 #datom [:tg/node-14907 :name "Louisa" 9 true]
 #datom [:tg/node-14907 :surname "Hurst" 9 true]
 #datom [:tg/node-14901 :tg/owns :tg/node-14907 9 true]
 #datom [:tg/node-14906 :tg/first :tg/node-14907 9 true]
 #datom [:tg/node-14904 :tg/first :tg/node-14905 9 true]
 #datom [:tg/node-14904 :tg/rest :tg/node-14906 9 true]
 #datom [:tg/node-14902 :tg/first :tg/node-14903 9 true]
 #datom [:tg/node-14902 :tg/rest :tg/node-14904 9 true]
 #datom [:tg/node-14902 :tg/contains :tg/node-14903 9 true]
 #datom [:tg/node-14902 :tg/contains :tg/node-14905 9 true]
 #datom [:tg/node-14902 :tg/contains :tg/node-14907 9 true]
 #datom [:tg/node-14901 :tg/owns :tg/node-14902 9 true]
 #datom [:tg/node-14901 :children :tg/node-14902 9 true]
 #datom [:tg/node-14901 :tg/entity true 9 true]
 #datom [:tg/node-14908 :db/ident "fitzwilliam" 9 true]
 #datom [:tg/node-14908 :type "family" 9 true]
 #datom [:tg/node-14908 :name "Fitzwilliam" 9 true]
 #datom [:tg/node-14910 :name "Catherine" 9 true]
 #datom [:tg/node-14910 :surname "de Bourgh" 9 true]
 #datom [:tg/node-14908 :tg/owns :tg/node-14910 9 true]
 #datom [:tg/node-14912 :name "Anne" 9 true]
 #datom [:tg/node-14912 :surname "Darcy" 9 true]
 #datom [:tg/node-14908 :tg/owns :tg/node-14912 9 true]
 #datom [:tg/node-14911 :tg/first :tg/node-14912 9 true]
 #datom [:tg/node-14909 :tg/first :tg/node-14910 9 true]
 #datom [:tg/node-14909 :tg/rest :tg/node-14911 9 true]
 #datom [:tg/node-14909 :tg/contains :tg/node-14910 9 true]
 #datom [:tg/node-14909 :tg/contains :tg/node-14912 9 true]
 #datom [:tg/node-14908 :tg/owns :tg/node-14909 9 true]
 #datom [:tg/node-14908 :children :tg/node-14909 9 true]
 #datom [:tg/node-14908 :tg/entity true 9 true])

Family Structure

Given this structure, which families have a child named Catherine?

=> (def family-db (d/db conn))
=> (d/q '[:find [?family-name ...]
          :where [?family :name ?family-name]
                 [?family :children ?children]
                 [?children :tg/contains ?child]
                 [?child :name "Catherine"]]
        family-db)
("Bennet" "Fitzwilliam")

If the idents of those families are chosen instead, then the family entities can be retrieved:

(map #(d/entity family-db %)
     (d/q '[:find [?family-ident ...]
            :where [?family :db/ident ?family-ident]
                   [?family :children ?children]
                   [?children :tg/contains ?child]
                   [?child :name "Catherine"]]
          family-db))

Which extracts the entities for each family:

({:type "family",
  :name "Bennet",
  :children
  ({:name "Jane"}
   {:name "Elizabeth"}
   {:name "Mary"}
   {:name "Catherine"}
   {:name "Lydia"})}
 {:type "family",
  :name "Fitzwilliam",
  :children
  ({:name "Catherine", :surname "de Bourgh"}
   {:name "Anne", :surname "Darcy"})})