-
Notifications
You must be signed in to change notification settings - Fork 9
Principles and concepts
- Design principles
- Register, RegisterItem, Entity
- Information model
- Referenced and Managed entities
- Status and Lifecycle
- History and versioning
- URI patterns
- Containers and bulk publication
- Validation
- Delegation
We minimise duplication of information within the registry to simplify data maintenance. Specific impacts of this principle are:
- Where there are two ways to link resources (via a pair of inverse properties, such as
dct:isVersionOf
anddct:hasVersion
) we standardise on one direction for internal use. - Some properties are “virtual”, they are derived from the stored information on request rather than explicitly maintained. In particular, non-monotonic properties such as
version:currentVersion
are avoided in the internal representation to simplify merging of updates.
The registry supports complex machinery for representing the metadata about the status of an entry within a register and for recording the history of changes to entries. The majority of register users are not expected to be interested in such details and simply wish to know what things are registered where. The API design provides simplified default views of the information model in which details of metadata and versioning are hidden. Register users who need access to these details have to explicitly request higher-fidelity views.
The simplified views involve a more radical transformation on the stored data than just omitting some properties. This means that there is a mismatch between the simplified information model presented by the default view and the full model exposed to users who directly access the internal store via SPARQL. We accept the extra burden on such advanced users in return for the simpler operation for the majority of users.
The API should, where reasonable, follow REST principles. Specifically that any resource in the system should be identified by a URI and be manipulable by standard HTTP verbs (GET, PUT, DELETE, PATCH).
The URIs for these resources should follow a simple pattern with as few restrictions as possible. The URI patterns should be predictable (“hackable”) to enable developers to understand the structure of the registry and to more easily identify errors. However, hacking of URIs should not be required in order to operate the registry.
Fundamentally the registry is a service to enable organizations to maintain controlled lists.
The essential concept here is the notion of a register.
A register is a single controlled collection (list). Each register is operated on behalf of some owner organization which provides the authority for the collection. For example, the WMO might wish to provide a list of approved codes to represent “runway deposits”, entries in which might represent “slush” or “snow”. Critically there is some governance regime which defines what can be registered, the process to be followed and the types of changes that may be made following acceptance of entry into a register. The user of such a register is assured by the owner that the set of codes within that register is complete and definitive for the declared usage – any code which is missing is not approved.
The owner may delegate the running of the register and enforcement of the governance policy to a manager who is responsible for validation and maintenance of register entries.
The type things that can be entered in a register is completely open. They include codes, ontology concepts, complete ontologies, coordinate reference systems, units of measure, spatial objects, organizations, licenses etc. We use the term entity1 to describe these things but there is no explicit reg:Entity
class, anything which can be given a URI can be registered.
A register (e.g. a subregister) can itself be registered as an entry in another register (e.g. a principal register), allowing us to create a hierarchy of registers.
Whilst not recommended, the registry permits a register to contain both subregisters and other types of entity. However, this practice may be prohibited for specific registers at the discretion of the register manager by specifying an appropriate governance policy.
As well as simply enumerating the entities which have been registered, the registry records information such as the status of an entry, the category of the entity in a classification scheme, aliases for the entity and so forth. This metadata is not intrinsic to the entity but is an aspect of how the organization regards the entity. The same entity might be entered in several registers and have a different status and classification in each, even within the same registry service. To achieve this the register maintains a set of metadata records called register items (represented by reg:RegisterItem
) which describes the entry.
1 We use entity rather than thing so as to avoid confusion with owl:Thing
which with DL-semantics (as opposed to RDF-semantics) would preclude registering classes and properties.
The full internal information model is described by the reg ontology http://purl.org/linked-data/registry# [doc] which is summarised in the diagram below:
TODO: identify which properties of reg:Register and reg:RegisterItem are ‘rigid’ (proposal from Jeremy 21-Nov)
We regard a register as being a container of the entities registered in it. This is represented in the information model by making reg:Resister
a sub-class of the Linked Data Platform container class ldp:Container
. In an LDP Container there is a direct membership property linking a container to its members. This property can be declared explicitly using ldp:membershipPredicate
or defaults to rdfs:member
. In keeping with our principles of minimising duplication we infer this membership predicate on demand as illustrated in the figure below. However, please note that these membership properties are materialised in the responses from the registry.
This enables us to present a simple view of a register as a direct container of entities. In keeping with our principles that common cases should be simple this simplified container view is the default view returned when a register instance is fetched. More advanced use of the API is required in order to retrieve the full internal view (e.g. including the register item metadata).
There are some situations (using a register as a specific container type, see Bulk registration) where it is preferable to have the membership predicate operate in the reverse direction, from entity to container. This is not supported by the current Linked Data Platform specification so we provide reg:inverseMembershipPredicate
for this purpose.
A register is also a void Dataset, enabling void-aware clients discover information on the register and contents.
As well as maintaining registers of entities the registry service provides a repository function where it stores descriptions of entities itself.
An entity whose authoritative definition is stored in the registry is called a managed entity. It is possible for a suitably authorised maintainer to update the definition of a managed entity through the API (a PUT or PATCH operation on the entity URI).
An entity whose master definition is held elsewhere can still be entered into a register in which case it is called a referenced entity. This may be simply an external resource, created and managed by another organization in a different DNS domain, or it may be a resource in a different register within the same registry. When such an entity is registered the submitter still needs to provide a minimal description of the referenced entity. By default this minimal description requires a label (rdfs:label
, or a sub-property of that) and a type (rdf:type
) though a particular register may impose additional constraints. The register will record the submitted description of the referenced entity. It is possible to update this recorded description if the entity changes.
Where a referenced entity is managed externally to the registry, the local storage of statements about the registered entity facilitates the indexing of locally stored information to improves search performance and ensures that information about the registered entity is available even if the remote system is (temporarily) unavailable.
Furthermore, one may choose to include supplemental information in the locally stored definition that is not included in the official source definition.
TODO: This section has been substantially revised following the addition of a different sort of “not valid” code to ISO 19135. Needs review.
A registered item has an associated status within the register. This status is not an intrinsic attribute of the entity itself but rather a statement of how the entity is regarded by the register’s authority (its owner).
The status codes are arranged in a hierarchy which reflects how groups of codes are treated. They are represented as SKOS concepts in a reg:StatusScheme
using skos:broader
links to represent the hierarchy.
reg:statusNotAccepted - corresponds to ISO 19135:2005 'notValid'
reg:statusSubmitted - corresponds to ISO 19135:(draft) 'submitted'
reg:statusInvalid - corresponds to ISO 19135:(draft) 'invalid'
reg:statusAccepted
reg:statusValid - corresponds to ISO 19135:2005 'valid'
reg:statusExperimental
reg:statusStable
reg:statusDeprecated
reg:statusSuperseded - corresponds to ISO 19135:2005 'superseded'
reg:statusRetired - corresponds to ISO 19135:2005 'retired'
At the top level status codes are arranged in two groups notAccepted and accepted.
Items with a code which is a specialisation of notAccepted will not be included in the normal listing of the register members. They are either entries which have been submitted but not yet approved ( submitted ) or have been deemed flawed ( invalid ).
Items with a code which is a specialisation of acccepted are visible in the normal listing of register members. The visible items either have a code of valid (meaning they are suitable for use) or deprecated (meaning they should not be used for new applications though they may still be employed by existing applications). Only valid entries are used in response to a validation request.
An item may become deprecated in one of two ways. It may be simply withdrawn ( retired ) or it may be replaced by an alternative ( superseded ). In the case of a superseded item it is possible to discover the item that succeeded it (via ^ reg:predecessor
).
A valid item may also be marked as experimental to communicate the intention that the item is being trialed and might be withdrawn or replaced. Conversely a valid item may optionally be marked as stable indicating that no change is currently anticipated.
In the normal lifecycle a new entry in a register is given status submitted and thus is not shown in the list of register members. Once the entry is approved it becomes valid and normally remains in the list of register members permanently. If the entry is later deprecated (either by being withdrawn from use or being superseded) then it remains visible but is not used for validation.
Sometimes an entry is deemed too flawed to approve, or is approved and then a serious problem is discovered, and the item should be removed from the list of members – not even appearing as retired. This process is termed invalidation and is supported by the invalid status code. The DELETE API call sets this status.
The status code hierarchy means that an item with a narrow code can be inferred to also have the corresponding set of broader codes. For example a retired item is also deprecated and accepted. The registry service implementation may choose whether to materialize these inferred codes in the internal data store or perform the inference at query time.
TODO: add status lifecycle diagram
The registry service maintains an accessible history of changes to registers and registered entities. This is to enable registry users to check whether a registration was valid at a particular time, for example to enable an organization to require use of a code list as defined on a particular date.
This versioning is independent of any audit trail maintained by the registry implementation for the purposes of maintenance and security checking.
Following best practices of identity management, once an entity has been accepted as a member of a register with a given identifier the semantics of that entity must remain consistent.
If the definition of a entity changes in a way that affects its semantics then a new entity with the amended semantics shall be registered, resulting in the creation of a new entry in the register. Once the inclusion of the new entry within the register has been approved by the register manager it can be recorded as superseding the previous entry (the new reg:RegisterItem
links via reg:predecessor
to the old reg:RegisterItem
which is then marked as status reg:statusSuperseded
).
Prior to the acceptance of an entity within a register, there are no restrictions on the changes that may be applied to the definition of that entity. This enables errors from the initial submission to be corrected.
However, it is permissible to make some inessential changes to an entity (for example, to correct a mistake in a label) and still maintain the same identity and registration. The register’s governance policy defines what constitutes an inessential change and in the general case human decision making may be required to implement the policy. When such a change is made then the new definition will be stored and a new version of the reg:RegisterItem
is created to reference the updated definition.
Similarly the metadata about an entry including its status, category and alias links may be changed. This again will result in a new version of the reg:RegisterItem
being created but in that case the reference to the entity itself is unchanged.
It is important for some use cases of the registry service that it be possible to refer to the state of a register at a particular time. The essential characteristics of a register includes the list of accepted entries (e.g. where the reg:RegisterItem/reg:status
is equal to reg:statusAccepted
and sub-statuses thereof). So if the list of accepted entries changes this implies that the register state has changed and a new version of the reg:Register
instance will be created.
Note that the submission of an entity for registration does not change the state of the register to which it is proposed. Although a new register item is created to relate the entity to the register upon registration, the entity is not considered to be a member of the register until after it has been accepted (denoted with reg:status
equal to reg:statusAccepted
and sub-statuses thereof).
Similarly, where the invalidation of an entity (e.g. as a the result of an entity being deemed to have a substantive error) triggers the status of an entity to be updated to reg:statusInvalid
, the entity is considered to be removed from the register (albeit not deleted), thus changing the set of members of the register and implying a change to the state of that register.
The register metadata may also be updated This change will also result in a new version of the reg:Register
instance. An inessential change to a particular register or registered entity item does not change the state of the register, unless it results in a change of visibility level (e.g. a change in status from reg:statusNotAccepted
to reg:statusAccepted
or vice versa). This avoids the need to cascade changes all the way up the register hierarchy on every change to the definition or metadata of a given entity.
For recording the history of version changes the registry service adopts a “hub/spoke” model using the version vocabulary http://purl.org/linked-data/version#
Each resource whose history trail is to be explicitly maintained is an instance of version:VersionedThing
. A particular version is an instance of version:Version
and is annotated with a version string (owl:versionInfo
) and a validity interval (version:interval
). The version is a dct:isVersionOf
the versioned thing and dct:replaces
the previous version.
Properties of a versioned thing that are essential to its nature (e.g. its type) are termed rigid and the version vocabulary provides a version:rigidProperty
annotation to declare the rigid properties of a class of versioned things. The rigid properties MAY be stored on the base version:VersionedThing
since they don’t change. All the properties of a particular version, including the rigid properties are materialized on each version:Version
instance.
Typically, a version:Version
will include both the mutable properties for that specific version and the rigid properties attributed to the base version:VersionedThing
.
We refer to types which are maintained using this versioning model as Versioned types. An instance of the registry service includes an explicit register (/system/versionedTypes
) which enumerates the versioned types for the service instance. By default reg:Register
and ref:RegisterItem
are versioned types.
It is not always necessary or possible to use this explicit version strategy. For example a referenced entity managed externally to the registry may be updated whilst its URI remains unchanged. Even for managed entities, the additional complexity of managing a set of identifiable versions of the entity may be unwarranted. In these cases, the registry maintains a historical copy of each version of what it knows about the registered entities (e.g. the information provided about that entity at the time of registration) as separate logical named graphs2 referenced from the register item using an instance of ref:EntityReference
which defines the entity resource and the graph in which it occurs. Each time the information about the entity is amended, a new named graph is created that is referenced from a new version of the associated register item.
In this way it is possible to trace the history of any entity, even an external one, by walking the trail of dct:replaces
links to find the desired version of the register item and following the ref:definition
link to the appropriate entity reference and thus the version of the entity.
This facility enables the registry to maintain managed entities even without explicit identification versioning. Such entities are termed stored entities.
In some registry implementations it may be permissible to add new types in the versionedTypes
register. This would cause the registry to handle any managed entities which match those types using the hub/spoke model. This means that it is possible to externally reference a particular version of such an entity rather than just a version of the corresponding register item. This complexity is deemed unnecessary for the common use cases and is not a required feature of a registry implementation.
2 Typically we would expect this to be implemented using a quad store with each named graph being part of the registry’s SPARQL dataset. However, implementations are permitted to record history using a low cost external mechanism so as a key value store or a source control system.
Registers and managed entities are assigned URIs within the DNS domain of the registry service.
Register | http://registry/{register}/../{subregister} |
Registered entity | http://registry/{register}/../{subregister}/{entity} |
RegisterItem | http://registry/{register}/../{subregister}/_{item} |
System reserved | http://registry/system |
VersionedTypes register | http://registry/system/versionedTypes |
Where registry represents whatever the base URI is for the registry e.g. location.data.gov.uk
.
The register hierarchy is reflected in the URL path hierarchy in the obvious way.
There are only three restrictions on naming of registers and entities:
- The top level register system is reserved for operation of the registry service.
- No register or managed entity can have a “_” as the first character of the final segment.
- Items (i.e. resources of type
reg:RegisterItem
) should always have a “_” as the first character of the final segment.
These resources are all linked data resources, they respond to a GET request with an appropriate RDF description. Note that the type of the resource should only be determined by examining this RDF response, not the syntactic form of the URI. The use of “_” to distinguish register items and entities is purely for the purposes of clash-free allocation and conceptual understanding, it is not part of the semantics of the URI. It is not, in any case, possible to distinguish a register and other types of entity just from its URI syntax.
The register resources also fulfil the contract for Linked Data Platform (LDP) Collections (as defined by the first pass working draft http://www.w3.org/TR/2012/WD-ldp-20121025/) to support listing, paging and adding of register entries.
The register items and the managed resources also support modification (for appropriately authorised users) through PUT and PATCH operations. By implication, statements about referenced entities can be amended through PUT and PATCH operations to the associated register item.
For details of these operations see API.
Given the flexibility of the URI structure a registry service can be regarded as a general purpose publication platform for distributed publication of RDF resources – optimized for high governance, modest volume, publication.
It is possible for the logical base URI of a registry service to differ from the physical DNS domain at which it is hosted.
The logical base URI is the URI used when new RDF resources representing registers, register items and managed entities are created. These URIs are used in the internal RDF storage of the registry and are returned in any RDF API responses.
This can different from the physical DNS domain at which the registry service is operating in at least two situations:
- A staging version of a registry service may be deployed to allow evaluation and testing before being released for use at the real DNS domain.
- A registry owner may wish to delegate some part(s) of the registry namespace to another party to run on their behalf via the delegation mechanism but they may still wish those resources (registers, register items and managed entities) to appear within their root namespace.
A registry implementation SHOULD support a mechanism to define the logical base URI for a given service deployment. HTML views of each registry resource should take this into account so that the HTML links in such views target the physical deployment URL of the service.
For example, consider a situation where the UK Location programme ran a registry rooted at http://location.data.gov.uk
and but wanted to delegate http://location.data.gov.uk/id/eaew
to be stored and managed on a separate registry service run by the Environment Agency. In that case the Environment Agency would run a registry service whose logical base URI was http://location.data.gov.uk
but whose physical location might be http://registry.environment-agency.gov.uk
(all these URIs are purely examples). In that way the register http://location.data.gov.uk/id/eaew
can delegate to http://registry.environment-agency.gov.uk/id/eaew
and all of the resources within that part of the location namespace would resolve correctly.
In the rest of the design documentation we simply use the notation http://registry
to stand in for both the logical base URI and the assumed deployment URL.
Note: This is section is under discussion and subject to change.
A register is simply a collection of entities (both in the sense of being a void data set and of being a LDP collection).
For some applications it is useful to also declare the register as being an instance of other collection types such as a skos:Collection
or skos:ConceptScheme
. The ldp:membershipPredicate
can be used to declare the appropriate relation between the register and the registered entity (skos:member
, skos:hasTopConcept
), or reg:inverseMembershipPredicate
can be used to declare a relation in the other direction (rdfs:isDefinedBy
, skos:inSchme
). In this way the registry service can be used to create and manage such collection types.
To further facilitate this usage the registry service also supports bulk registration of instances of such collections. For example it is possible to upload an entire SKOS Collection to a register, creating a sub-register corresponding to the Collection and entries within that sub-register for each Concept in the collection. This enables convenient publication of collections using the “slash URI” pattern, including ontologies. The set of collection types which are supported for bulk upload is defined by the registry service through the /system/bulkCollectionTypes
register. The supported types must be listed in that register. It is service-dependent whether new types can added to the bulk-collection-types register.
The notion of validation arises in two distinct parts of the operation of the registry service.
When an entity is submitted to a register there are two stages to the process of validation.
Firstly a technical validation process is run at the point of submission. This checks that:
- the submission is syntactically valid;
- the entity has at least one value for each the mandatory properties
rdf:type
andrdfs:label
(in the case ofrdfs:label
there must be a value within an operating language of the register) and any declared rigid properties; - if the submission is a internal referenced entity which falls within the namespace of the registry (but outside that of the target register) then it must already exist, in that case the registry will fetch the mandatory properties from the existing registration so these need not be explicitly included in the submission payload;
- if the register declares one or more SPARQL ASK validation queries (
reg:validationQuery
) then all of those queries, when applied to the submitted graph, must return true.
All customization of the technical validation is done through declaration of validation queries.
Note that we do not require external reference entities to resolve at time of submission and do not automatically fetch any properties of them.
If a submission passes technical validation a reg:RegisterItem
record will be created (with state statusSubmitted
) and linked to the register.
At that point a register-specific approval process will be triggered. This may be a manual vetting process, including appeal options, or it may be automated or a mix of the two. For example, a further technical validation process might be run as an asynchronous background process before triggering a final human approval. The registry provides a means to declare the governance policies (reg:governancePolicy
) but the mechanics of this approval process are outside the scope of the registry service itself.
When an entry has been approved its status will be updated to statusValid
at which point it becomes a visible entry in the register.
The other notion of validation supported by the registry service is the ability to verify that a given entity, or set of entities, is registered in a given register. This is supported by the entity and validate queries described in API.
In a Linked Data setting then a key function of the registry is to provide effective management of a shared URI namespace. In both public sector and enterprise use of Linked Data there is a requirement for multiple organizations to be able to publish reference information into a common namespace. To avoid the registry becoming a scaling bottleneck, especially for update, then it must be possible for multiple organizations to serve their own parts of the shared namespace and not force all information to be physically stored in a centralised registry implementation.
To support this requirement the registry service provides three mechanisms for delegating parts of the registry namespace – namespace forwarding, registry federation and register delegation.
In each case the delegation is enabled by registering an entity which is a sub-class of reg:Delegated
into a parent register. The reg:delegationTarget
defines the service to which requests to the registered URI should be forward.
This is enabled by registering an entity of type reg:NamespaceForward
which indicates a delegation target URI to forward to, and optionally a status code to use for forwarding (reg:forwardingCode
).
For example, suppose we have a register http://registry1/register
.
If we register a new entry in the register at relative location ext
:
<ext> a reg:NamespaceForward;
reg:delegationTarget <http://extregistry/root/base> ;
reg:forwardingCode 307 .
Then a request to http://registry1/register/ext/foo/bar
will be forwarded to http://extregistry/root/base/foo/bar
by returning an http 307 response with Content-Location: http://extregistry/root/base/foo/bar
.
The default forwarding code in the absence of an explicit setting is 307.
A registry service implementation is only required to support 30X status codes. It MAY choose to support a proxy configuration indicated by specifying a 200 response code. In this case the registry service itself passes the request to the delegation target and returns the response to the requester. If the service does support proxy mode it SHOULD respect the http cache control headers in the response.
Note that there is no restriction on the behaviour of the delegation target. So entries at and below http://registry1/register/ext
in our example will no longer necessarily support the registry API. Search requests to http://registry1/register
(or above) will not include any content served from the delegation target.
This is enabled by registering an entity of type reg:FederatedRegister
.
For requests to any resource within and below the federated register then this mode acts precisely the same as namespace forwarding.
The key difference in this case is that the delegation target is assumed and required to support the full registry API. Search requests to parent registers of the federated register will trigger the search query to also be sent to the delegation target register and the results will be aggregated into the local search results.
The purpose of this delegation mode is to enable other organizations to maintain and serve the list of contents of a register.
This is enabled by registering an entity of type reg:DelegatedRegister
with a reg:delegationTarget
which gives the SPARQL endpoint for the service which will supply the item and membership information. This is a subtype of reg:Register
and all normal register properties apply. In addition the delegated register specifies a partial triple pattern which can be used to enumerate the members of the register (reg:enumerationSubject
, reg:enumerationPredicate
, reg:enumerationObject
).
For example, suppose we have a register http://location.data.gov.uk/id
in which we register:
<bathingWaters> a reg:DelegatedRegister;
rdfs:label "Bathing waters"@en;
rdfs:description "Bathing Waters from the environment agency"@en;
reg:owner <http://reference.data.gov.uk/2011-09-30/id/public-body/environment-agency> ;
reg:manager <http://reference.data.gov.uk/2011-09-30/id/public-body/environment-agency> ;
ldp:membershipPredicate rdfs:member;
#
reg:delegationTarget <http://environment.data.gov.uk/sparql/bwq/query>;
reg:enumerationObject <http://environment.data.gov.uk/id/bathing-water/>;
reg:enumerationPredicate <http://reference.data.gov.uk/def/reference/uriSet>;
.
Then a GET request to http://location.data.gov.uk/id/bathingWaters
will trigger a request to the specified SPARQL endpoint to discover all members of the register:
CONSTRUCT
{
<http://location.data.gov.uk/id/bathingWater> rdfs:member ?member.
}
WHERE
{
?member <http://reference.data.gov.uk/def/reference/uriSet> <http://environment.data.gov.uk/id/bathing-water/> .
}
The result of this query, along with the register specification itself will be returned as the register membership contents.
It is possible for the delegation target to list register members which occur within the namespace of the delegating register. The register will respond to a GET request on such an entity by first obtaining a description of the entity from the delegation target using SPARQL Describe and then returning that description.
So in the above example if the SPARQL endpoint <http://environment.data.gov.uk/sparql/bwq/query>
contains a triple:
<http://location.data.gov.uk/id/bathingWaters/ukl1234>
<http://reference.data.gov.uk/def/reference/uriSet>
<http://environment.data.gov.uk/id/bathing-water/> .
Then a GET request to <http://location.data.gov.uk/id/bathingWaters/ukl1234>
will be responded to by the registry by issuing:
DESCRIBE <http://location.data.gov.uk/id/bathingWaters/ukl1234> .
to <http://environment.data.gov.uk/sparql/bwq/query>
and return the resulting RDF graph to the requester.
Note: such a delegated register merely acts as a container of registered entities and does not support the full register API. It does not support update of the container contents (whether via PUT, PATCH, POST or DELETE) and provides neither versioning information (_view=with_version
) nor metadata (_view=with_metadata
). It only supports register reading (plain GET) and the Linked Data Platform read behaviours (?non-member-properties
, ?firstPage
). The register metadata itself, as opposed to the register contents, can be updated using PUT or PATCH to ?non-member-properties
).