Skip to content
This repository has been archived by the owner on Sep 11, 2023. It is now read-only.

catenax-ng/product-agents-edc

Tractus-X Knowledge Agents EDC Extensions (KA-EDC)

GitHub contributors GitHub Org's stars GitHub GitHub all releases Quality Gate Status

KA-EDC is a product of the Catena-X Knowledge Agents Kit implementing the core modules of the CX-0084 standard (Federated Queries in Dataspaces).

About the Project

This repository hosts the relevant reference extensions to the Eclipse Dataspace Components (EDC). It provides container images and deployments for a ready-made KA-enabled Tractus-X EDC.

In particular, KA-EDC consists of

  • Common extensions in order to allow for secure and personalized application access to the EDC infrastructure.
  • Agent (Data) Plane extensions to ingest, validate, process and delegate federated procedure calls (so-called Skills) on top of data and functional assets. In particular, they implement the Semantic Web SPARQL protocol.
  • Helm Charts for umbrella deployments.

Source Code Layout & Runtime Collaboration

Source Code

Above is a collaboration map of the main implementation classes found in this repository.

It starts with an application performing a SPARQL call against the Consumer's AgentController of the Agent Protocol Data Plane Extension. This call may be handled by a AuthenticationService. Using the configuration facilities of the JWT Auth Extension which sets up single JwtAuthenticationService or composed CompositeAuthenticationService the handler stack may analyses diverse authorisation features of the incoming request, such as checking a JWT-based bearer token for validity against multiple OpenId servers by CompositeJwsVerifier.

The AgentController delegates the call upon preprocessing (e.g. by resolving local Skill Asset references using the EdcSkillStore) to the actual SparqlQueryProcessor (an instance of an Apache Jena Sparql Query Processor). The SparqlQueryProcessor is backed by an RDFStore which hosts the Federated Data Catalogue (and that is regularly synchronized by the DataspaceSynchronizer).

Whenever external SERVICE references in a SPARQL query are to be executed, the SparqlQueryProcessor will ask the DataspaceServiceExecutor to execute the actual sub-operation. This operation could - depending on the actual query binding context - either point to multiple tenant-internal or public endpoints. The operation could also need to be batched in case that there are too many bindings to transfer in one go (see the maxBatchSize Parameter in the Agent Protocol Data Plane Extension). The operation could also hint to dataspace addresses (as indicated through URLs starting with the edc:// or edcs:// schemes). In this latter case, DataspaceServiceExecutor will ask the AgreementController for help.

AgreementController keeps book about already negotiated Dataspace Assets and corresponding EndpointDataReferences. If such an EDR does not yet exist, it will negotiate one using the EDC control plane with the help of the DataManagement facade. The resulting EDR will be asynchronously handed out to the AgreementController and finally returned to DataspaceServiceExecutor to perform the Dataspace Call (effectively tunneling the SPARQL protocol through EDC's HttpProxy transfer).

When the call arrives at the Provider's Data Plane, it will hit the AgentSource. Mirroring the Consumer's AgentController, AgentSource performs some preprocessing and validity checking before finally delegating to the Provider's SparqlQueryProcessor (from where the recursion may go further ...)

Getting Started

Build

To compile, package and containerize the binary artifacts (includes running the unit tests)

mvn package -Pwith-docker-image

To publish the binary artifacts (environment variables GITHUB_ACTOR and GITHUB_TOKEN must be set)

mvn -s settings.xml publish

Deployment