Metadata lifecycle

Needs to be refined during https://github.com/OP-TED/ted-rdf-conversion-pipeline/issues/554

The Operational Metadata is collected to enable Data Operations

When the Operational Metadata is collected?

Metadata can be generated at different points in the process:

Before the transformation: Capturing what is already known about the pipeline and mappings.
During transformation: Logging performance metrics or key events.
After transformation: Recording the results and outcomes of the job.

When to collect the metadata will depend on the transformation method. For example, in a streaming scenario, part-whole relations and count statistics may be captured during step 2, while in a batch process, these statistics may occur in step 3.

How is the Metadata persisted?

Data operations happen at two levels (See:Data Operations#Granularity), for batches and notices. The simplest is to maintain one JSON document for each batch and notice, containing the Operational Metadata.

Note: The metadata can be stored in current databases simplifying https://github.com/OP-TED/ted-rdf-conversion-pipeline/issues/553. Alternatively, metadata can be logged as quads in a TRIG file, specifying named graphs.

How is the Metadata consumed?

It should be possible to do queries to support Data Operations. Additionally, the stored metadata of batches and notices must be accessible to downstream systems through a URL, making easy its consumption. The metadata later on can be transformed into RDF to be linked or included in a Data Catalog.

How the metadata is updated?

There exists only one metadata document for each Batch or Notice. A proposed approach for updating metadata is as follows:

On Success: The metadata document is upserted, ensuring that the most recent information is always available.
On Failure: Failure events are appended to the existing metadata document. This maintains a history of failures until the job succeeds.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metadata lifecycle

Metadata lifecycle

When the Operational Metadata is collected?

How is the Metadata persisted?

How is the Metadata consumed?

How the metadata is updated?

Clone this wiki locally