Skip to content

Conceptual architecture

der edited this page Dec 4, 2012 · 13 revisions

Overview

The diagram below shows the high level functional blocks needed for a registry service implementation. This is not a formal component architecture. There are many ways that the registry service could be structured internally. The only formal component interfaces are the external interfaces defined in the Api specification which in turn reference open standard specifications, particularly SPARQL 1.1. The example functional breakdown shown here is meant to be illustrative, in order to understand the functionality that must be delivered, rather than prescriptive.

Component descriptions

Dispatcher

All requests to the registry service are made over HTTP connections which are initially received by the dispatcher.

The dispatcher has the following responsibilities:

Forwarding. Parts of the registry URI space can be configured to forward to other remote services by means of an HTTP (status code 30x) response. This is configured by means of NamespaceForward or FederatedRegister registrations. The dispatcher provides an internal interface by which the registry core can dynamically update the set of forwarding rules current in effect.

Proxy. Optionally a registry service may support proxy forwarding as well as redirection. The advantage of this is that the delegation target URIs do not appear in the HTTP response. The disadvantage is that raises scaling challenges because the connection to the proxied service must be held open until a response (or timeout) is received and such connections can be a significant limiting factor on some implementation platforms. A registry service which chooses to implement proxy support MAY choose to aggressively cache remote responses to manage this cost.

Request filtering. Request which are not forwarded MAY be subjected to additional filtering rules including, but limited to, rejection of blocked user agents (e.g. banned crawlers), rejection of blocked IP addresses and request throttling to manage server load especially in response to suspected denial of service attaches.

Dispatch. All requests that are neither forwarded nor filtered are then dispatched to the relevant internal service. An implementation may choose to partition and structure the internal services and dispatch processing in many ways, including internal HTTP routing to replicated or specialized internal servers. In that case the dispatcher acts as a reverse proxy for the internal service architecture.

Logging. All external requests should be logged using normal web server logging practises.

In the example functional partitioning shown in this conceptual architecture then there are two dispatch targets the request processor which handles all registry specific APIs requests and the RDF Store to which raw SPARQL queries are routed.

Request processor

The request processor takes all registry API requests passed by the dispatcher and determines if they are authorised to proceed. This involves:

  • converting the request payload to an normalised internal RDF representation
  • identifying the action requested and action target
  • authenticating the requesting user
  • testing if the user is authorised for the requested action on the target resource
  • logging the request and authorisation outcome in an audit log trail

If the action is authorised it is passed to an appropriate internal registry interface supplied by the registry core.

The registry service is required to support a range of MIME types in which the RDF request payload can be supplied. The mandatory types are application/rdf+xml and text/turtle. Optionally application/ld+json may be supported. The request processor is responsible for converting these request syntaxes into an internal RDF form for supply to the registry core.

Auth

The auth module provides the authentication and authorisation services used by the request processor.

User authentication should be based on standard web service security practises, for example Basic Authentication over HTTPS. The credentials database for user authentication is a matter for the service implementation but SHOULD be separated from the registry RDF store.

The authorisation process should support a role-based access control mechanism whereby users are assigned to roles and roles are permitted a range of actions on each sub-tree of the registry namespace.

In some registry service settings the roles and the users assigned to those roles may be publicly accessible in which case the role and role-user bindings MAY be held in the registry data store and/or may be accessible as configuration registers under the /service reserved namespace.

Audit trail

The audit trail records all requests at the level of registry actions and the associated authorised user. This supplements, not replaces, the dispatcher’s web server logs. The audit trail provides for more convenient application level trace back which can help with both service maintenance and verification.

It is up to the service operator to define the level of audit logging required for a particular implementation. This may range from no such logging required though to a complete journal-ed record sufficient to enable the entire registry state to be replicated in the event of complete storage and storage backup failure.

Registry core

This module provides all of the “business logic” to implement the registry API actions as a set of transformations of the RDF representations held in the RDF store.

We describe its operation in terms of serveral sub-modules:

Update logic

The bulk of the defined registry API provides a set of ways in the registry state can be updated. Specifically:

  • registration of new entities within a register
  • creation of new sub-registers
  • updating the description of a registered entity (whether the canonical description of a managed entity or the registered partial description of a referenced entity)
  • updating the metadata describing a registered entity (register item)
  • changing the status of a register item, a special case of the previous item
  • deletion of registers and registered items (equivalent to changing the status to invalid)

The update logic provides implementations of each of these specific actions.

In all cases a set of action-specific validation steps is run before the action is permitted. The validation module provides common facilities reused by the separate action functions.

If the action is validated the logic for each action involves updating one or more resources (register, a register item or an entity reference). In each case there are API-specific rules to determine the new desired state including merging of current and submitted state, default-override rules and automated properties. There are specified in the Api design.

Validation

The validation module provides checking of an update payload before permitting the operation to proceed. This includes:

  • testing URI conformance, that the payload URIs match the target resource
  • life cycle enforcement so that life cycle transitions are restricted to the transitions described in the life cycle state diagram and that immutability of properties is enforced for accepted items
  • per-register enforcement of validation rules specified via the reg:validationQuery register property

Version management

Any successful update to the register state results in the creation of a new version of a register, register item or both. The versioning model provides details of this version history of resources is to be represented in RDF. The version management module is responsible for implementing this versioning model including management of the associated named graphs.

Search and retrieval

The registry API provides for retrieval of individual resources. It also supports search operations which traverse a register subtree testing for registration of a particular entity (entity retrieval), of a set of entities (validation) or of all entities matching a set of search criteria (search query).

The search module translates such implicit and explicit search requests into either a SPARQL query (routed to the RDF store) or a free text query (routed to the text index) or a mix of the two.

If the register tree contains one or more federated registers then the search interface must pass the search query on to those federated registry services via the federation manager. Similarly if the register tree contains one or more delegated registers then the search must be converted to a SPARQL query over the delegated SPARQL endpoint, again via the federation manager.

The cost of filtering the search to the target register hierarchy may be non-trivial. An implementation may implement some interval coding scheme for register containment and is permitted to record such information as part of the RDF representation of registers.

View generation

Both the individual retrieval and the search APIs support a range of different view modifiers as discussed in the Api (paging, non-member-properties, with-metadata and with-version views). The view generation module implements these modifiers – retrieving the correct version of each entity in the response, retrieving the associated register item if required and merging version/verioned-thing views.

Renderer

This module takes the response RDF and renders it according to the client request accept headers.

In the simple case (application/rdf+xml, text/turtle and optionally application/ld+json) then this rendering is straightforward serialization of the response RDF into the requested RDF syntax.

For text/html then the registry service is required to provide human readable web pages which make it possible for a user to navigate the registry via a browser. The precise format an structure of these pages is at the discretion of the registry service implementation. However, implementations are likely to provide a means (style sheets, rendering templates) to allow a given instance to tune the look & feel of the service.

The logical base URI of a registry service may differ from the physical DNS domain at which it is hosted (e.g. to support a staging version of a registry service to allow evaluation and testing, or delegation of some part(s) of the registry namespace to another party).

HTML views of each registry resource shall enable navigation within the physical deployment domain; HTML links in such views target the physical deployment URL of the service.

RDF store

The registry state is maintained in an RDF store.

The versioning model requires multiple states of the same resource to be described in the same store using named graphs.

Direct access to the RDF store is provided via a SPARQL 1.1 compliant query endpoint.

A particular registry service implementation is free to architect this facility in different ways. The RDF store could be an embedded part of the registry service implementation which could directly expose a SPARQL 1.1 query facility. Alternative the store could be run as a separate service, possibly replicated. If the store itself is separated then it may use the standard SPARQL 1.1 Update protocol for routing compiled updates to the store as we as using SPARQL 1.1 Query for all retrievals.

Note that the SPARQL endpoint exposed to registry users MAY be a replicated instance of the operational store for performance and/or security reasons.

Text index

The registry search specification requires support for free text search.

While this a common feature of many RDF store implementations it is not a formal part of the SPARQL specification and so is shown as a separate component of the conceptual architecture. An particular registry service may implement this is a integral part of the main RDF store, as a separated but embedded text search index or as a separate distributed text search service.

Federation manager

The federation manager is responsible for handling requests to both federated and delegated registers.

The federation manager MAY provide caching of search requests, including a common framework for cache timeout, to enable a balance between adequate performance of federated queries and latency of update propagation. Registry service implementations may choose different trade-offs here.

The federation manager MAY actively harvest resources from federated and delegated registers for local indexing, in particular local free-text indexing. A full framework for active change notification is out of scope for the current specification so similar time-to-live considerations apply here as for reactive caching.

Access UI

The registry operation is specified in terms of a web service API so that a diversity of clients can access and update a given registry service.

In addition a registry service should provide a basic web user interface to allow registry users to at least:

  • discover resources
  • assess the status of a resource

This would typically comprise:

  • a search page to support free-text search
  • a retrieval and validation page which can find all entries for a given entity or set of entities
  • rendering for individual entries which makes clear both the description of the entity and the current status (as recorded in the registry item metadata)
  • a navigation facility to allow navigation from an entry to its containing register and from a register down to contained items and up to containing registers
  • a navigation facility to follow links to previous versions of items.

All such access user interfaces SHOULD be implemented as separate services which internally call the same registry web service API as any other client implementations.

Admin UI

The process and workflow management for submission, review and approval of registry entries is outside the scope of the registry service. The registry service simply provides the technical API through which the results of the process can be recorded and accessed. The registry service MAY provide a web forms interface for management of register entries or may delegate the interface to custom client implementations which access the registry service via the specified API.

In additional to any register management facilities a registry service should provide some facility for administration of the service itself including:

  • administration of access and authorisation (users, user credentials, user roles)
  • administration of the service itself (backups, rendering templates, cache management etc)

The precise scope and nature of this administration interface is at the discretion of the service implementations.

Scaling considerations

As noted in the overview, this conceptual architecture is intended to show the functional components required. Implementers may choose a variety of different approaches to deliver the specified API. Here we point out some of the considerations that implementers may need to take into account.

Volume scaling

The primary envisioned usage for a registry service is the registration of data sources, reference identifiers, code lists and so forth. In such applications the sheer number of items to be managed is likely to be modest.

The API design is suited to a simple implementation strategy based on an underlying RDF “triple” store, as illustrated in this conceptual architecture. Assuming O(5) versions of each registered item and O(40) triples per registered item version then registries up to around 5 million registered items can be supported on a single server instance running open source triple stores.

The registry service design provides for various forms of delegation so that substantially more entities may be accessible through the registry namespace than are physically stored in the registry service itself.

Note that the natural implementation of the registry API using a triple store uses named graphs within the same RDF dataset to hold the different views of each entity. The implementer should thus select a triple store implementation that can scale to large numbers of named graphs.

For higher volumes of registered items then a number of alternative implementation strategies are possible:

  • use of a clustered triple store implementation (open source or commercial)
  • separated storage of the named graphs for prior versions, using the live triple store only for the latest version of each resource (may give order of magnitude scaling benefit depending on use of versioning)
  • use a non-triple store solution, such as a horizontally scalable key-value store

Request scaling – forwarding

A key part of the registry service design is the support for namespace forwarding and register federation (see delegation). The conceptual design supports this through use of the dispatcher. The dispatcher should be designed to support high throughput of forward requests and so should not perform live queries to the registry RDF store as part of normal forwarding. Instead, the register core should compile any registrations of reg:NamespaceForward or reg:FederatedRegister into a set of proxy forward rules for the dispatcher to act on.

Implementers may reuse industry standard scalable proxy implementations as components within a dispatcher design so long as they are able to dynamically update the proxy configuration from the registry core logic.

The option to harvest and cache entries in federated registers for high performance search is afforded by the Federation Manager component within the conceptual architecture.

Request scaling – access

Access to registered items or complete registers normally requires execution of the registry core logic which in turn will perform a set of SPARQL queries to retrieve the entity descriptions from the registry RDF store.

For registry instances which are expected to handle a high volume of look up requests then implementers should take into consideration caching and replication options.

The registry RDF store may be implemented as a replicated service to enable higher throughput of queries.

The formatted API response may be cached using standard HTTP cache implementations (typically as part of the dispatcher).

Implementers may also consider caching at the RDF store interface, for example by maintaining precomputed RDF views of each entity in a high performance cache.

Request scaling – update

Registration of new entities or updating the status of an item entails non-trivial logic in the registry core together with a number of SPARQL queries, which result in a SPARQL Update to the registry status.

For normal patterns of registry use very high volumes of update requests are unlikely so a normal three tier architecture is likely to be sufficient. The business logic (registry core) may be horizontally replicated if necessary to handle update throughput. The database (registry RDF store) implementation should be chosen to handle the the anticipated update rates.

For extreme cases then a non-RDF storage solution may be considered.

SPARQL support

A registry service MAY omit the SPARQL endpoint from the Search and Discovery API in the interests of scaling.

In the absence of a SPARQL endpoint point then other implementations of the underlying storage solution, including high performance key-value stores are possible.

Furthermore the SPARQL endpoint is a possible source of very high cost requests which may impact other users. As noted earlier an implementation which does provide a SPARQL endpoint MAY do so via a separate store replica which isolates registry operation from such access requests. This will entail some latency between successful change of registry state and visibility of the change through the SPARQL endpoint.

Component phasing

Component Proof of Concept Deployment
Dispatcher Required, non scalable Scalable forwarding
Request processor Required Required
Auth No Required
Registry core Partial, some operations may be stubbed out Full
RDF store Required Required
Text index Local index Index external registers
Federation manager Stub Full
Renderer Required Required
Access UI Demonstration level Full
Admin UI No As appropriate